


THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 





Volume XXIV September, 1933 Number 6 


—— a -_——_— 
= —- _-— —_— 











THE INFLUENCE OF INCREASE AND DECREASE OF 
THE AMOUNT OF REWARD UPON THE RATE OF 
LEARNING* 


EDWARD L. THORNDIKE AND GEORGE FORLANO 
Teachers College, Columbia University 


EXPERIMENT A 


The material for learning consisted of a series of lines, each con- 
sisting of a Spanish word and five English words one of which was 
the correct meaning for the Spanish word. There were three series, 
called CI, CII, and CIII each of one hundred such lines. CIII was 

sed only in fore-exercises; lines one to eighty of CI and one to eighty 
of CII were used in the actual learning. The experimenter followed 
exactly the following instructions: 


Instructions for Experiment A: Program I 


Say to the subject, ‘‘ You are to learn to find the right meanings 

or certain Spanish words. Look at this page (showing him the blank 
III). In line one the Spanish word is ‘abada.’ You are to learn 

which of the five English words it means. At the beginning you will 
ust have to guess. Then I will say Right if you are right and give 
you one of these tokens. If you are wrong, I will say Wrong and give 
you nothing. After we have been through the words once, we will 
io them again, and this time you will guess again and get more right 
because you will remember some of those that you got right before, 
and also guess new ones. Then we will go through the twenty words 
again and this time you will get still more right, probably. At the 
nd of the learning, I will give you a cent for every ten tokens that 





*The experiments reported in this article were part of an investigation of 
nterests and motives supported by a grant from the Carnegie Corporation. 
401 








402 The Journal of Educational Psychology 


you have, that is, each token will be worth one-tenth of a cent. Now 
we will begin.”’ 

Take sheet CI, words one to twenty. Have the subject underline. 
Say Right or Wrong. Give him one token if he is right and make sure 
that he underlines in the next line without delay, and so on. Just 
as soon as you say Right or Wrong, he is to go ahead and underline 
in the next line. As soon as he has finished one trial, repeat with the 
same twenty words. Do this again. Do this again, making four in 
all as stated in the program. 

Then say, ‘‘ Now we will try with another twenty Spanish words 
and this time I will give you two tokens every time you get a word 
right, so that you will make twice as much as you did before, if you 
remember the words as well.”’ Take the sheet with words twenty-one 
to forty of CI and give four trials as before. 

Then say, ‘‘ Now we will try twenty different words. This time I 
will give you four tokens for every one that you get right, so that you 
can earn four times as much as you did the first time, and twice as 
much as you did the second time.’”’ Give four trials, using forty-one 
to sixty of CI. 

Then say, ‘‘This time I will give you eight tokens for every one that 
you got right, so that you will make twice as much as last time, if you 
do as well.’’ Give four trials using words sixty-one to eighty of CI. 

Then say, “‘This time I will give you eight tokens for each word 
right just as we did last time.’”’ Give four trials of CII, words one to 
twenty. 

Then say, ‘‘I wish I could keep on giving you eight tokens every 
time, but it is costing too much, so next time it will have to be only 
four.”” Give four trials of CII, words twenty-one to forty. 

Then say, “‘I am sorry that I have so little left, but I will have 
to reduce and give you two tokens for each right this time.’”’ Give 
four trials of CII, words forty-one to sixty. 

Then say, ‘‘I have to keep some money for the experiment with 
the other boys, so this time I can give you only one token for each right, 
just as we did at the beginning.” Give four trials of CII, words 
sixty-one to eighty. 

With each trial of twenty, put the tokens in one pile. When the 
four trials are completed, cash them into real money, but leave them 
in a pile before the pupil, so that, for example, at the beginning of 
Session 1b, he will see a pile of say five cents and three tokens as his 
earnings from Session la. As Session 1b progresses, he v ll see the 





piles 
At t 
how 
befo 


elev 
eigh 
then 


and 
tho: 
II 


rew 
of J 


trai 








~~ & Oe ce OD = 


oO = me TM 


’ 


e 
of 


1€ 





Influence of Reward upon Learning 403 


piles of tokens. This will grow bigger with each trial as he learns. 
At the end of the session, give him the money, but make a note of 
how muchit was. At the beginning of the new session, proceed just as 
before. 

There were ten subjects of the experiment, all boys aged ten to 
eleven years. For each of these subjects, the learning was divided into 
eight sessions in each of which twenty lines were done, then done again, 
then again, then again, as shown in the program below. 








ProGram I 
Session Repeat For rights 
CI 
i aaah hid aati ci 1-20 | Four times | Say right or wrong | Reward 1 
wa teeis Juana been 21-40 | Four times | Say right or wrong | Reward 2 
eT Tr oer 41-60 | Four times | Say right or wrong | Reward 4 
i ectvéedeneuss aehed 61-80 | Four times | Say right or wrong | Reward 8 
CII 
Ore ae ee eT 1-20 | Four times | Say right or wrong | Reward 8 
EE ee eee ee 21-40 | Four times | Say right or wrong | Reward 4 
eT er 41-60 | Four times | Say right or wrong | Reward 2 
DR cues che set esee uae 61-80 | Four times | Say right or wrong | Reward 1 

















The number of correct meanings given out of twenty in each trial 
at each session is reported in Table I. There was learning under 
each condition of reward. Using the difference between the average 
score in Trial 1 and that in Trial 4 of the same session, we have the 
following: 


Reward of 1, in session 1, 2.0; in session 8, 3.6; average 2.80 
Reward of 2, in session 2, 2.7; in session 7, 3.8; average 3.25 
Reward of 4, in session 3, 4.9; in session 6, 3.3; average 4.10 
Reward of 8, in session 4, 3.5; in session 5, 4.2; average 3.75 


With the same material and with ten new subjects of similar age 
and training Program II was operated, the procedure being similar to 
those described above but modified to fit the arrangement of Program 
II which was like Program I save that the order of the use of the 
rewards was 8, 4, 2, 1, 1, 2, 4, 8. 

The results appear in Table I. The average gains from rewards 
of 1, 2, 4 and 8, were, respectively, 2.1, 2.5, 3.6, and 2.4. 

With the same material and ten new subjects of similar age and 
training, Program III was operated. Program III is like Program I 








404 The Journal of Educational Psychology 


and II save that the order of the use of the rewards was 4, 8, 1, 2, 2, 
1, 8,4. The results appear in Table I. 

The average gains from Trial 1 to Trial 4 with rewards of 1, 2, 4 and 
8 were respectively, 2.9, 4.1, 4.2 and 3.6. 

For all three Programs together the average gains for rewards of 
1, 2, 4, and 8 are, respectively, 2.7, 3.3, 3.6 and 3.3. An increase of 
the reward from 4 to 8 seems not beneficial. 


TaBLeE I.—TxHe AverRAGE NuMBER OF CoRRECT RESPONSES OUT OF TWENTY, 
Mabe BY THREE Groups, Eacno ContTAINING TEN Boys, tn Eacu or Four 
TRIALS AT CHOOSING THE RigHtT WorpD OvT oF Five For EAcuH oF 
TwENTYy Spanish Worps, WHEN THE REWARD OF “Rigut” Was 
SUPPLEMENTED BY ONE-, Two-, Four-, OR EIGHT-TENTHS OF A 
































CENT 
Reward of 1 Reward of 2 Reward of 4 Reward of 8 
Blof ool] S| a] al ol | sg] al alo] | gs] | alo! « 
‘g| 4| S| S| S| | S| S| S| S| 3] S| S| S| S13) S/S =) 
SIEVE ELSES ELSE ES EEL ELE) SB ELE ElE 
Ble) | & SR) ee] me Oe Oe Oe] ol Oe a eB 
_ 
| err 1 |4.2/4.9/6.0/6.2) 2 |4.0/5.2|7.116.7| 3 |3.3/4.6/6.3|8.2| 4 sale aly g 
8 |5.016.7/8.4'8.6| 7 |4.3/4.916.2/8.1] 6 |5.5/6.4/8.3/8.8) 5 |4.2/5.7.7.48.4 
Group II........ 4 |5.3/6.0/6.9/6.2) 3 |5.0/5.7|7.9|7.5) 2 |2.914.8)5.9/7.7) 1 ible 
5 |3.9|6.0/6.2)7.2) 6 |6.0/6.8/8.2/8.4| 7 |5.1/7.2|7.6/7.4) 8 |4.3|4.815.96.7 
Group III....... 3 |4.0\4.5|5.4/6.3) 4 |3.7|5.5/7.6)7.9] 1 |4.4|6.0/6.5/7.8) 2 scbeisle.ole. 
6 |4.5|6.0\6.8/7.9) 5 |3.0)3.5)/4.7/7.0) 8 |4.1/5.5|6.3)6.8) 7 |4.416.1|6.9:7.5 
Average......... .. 4.515.616.4)7.2) .. 14.315.3|6.917.6) .. |4.215.7/6.817.8) .. 4.3)5.4/6.57.6 






























































EXPERIMENT B 


For comparison with these three sets of results with varying 
money-rewards, a fourth program was carried out, identical with 
Program III except that no distinctions were made in the amount of 
the rewards. The immediate reward was always the mere announce- 
ment of ‘“‘Right,’”’ and there was a remote unlocalized reward in the 
shape of a promised general payment of ‘‘a little money” for doing 
the whole experiment. Ten new subjects (Group IV), comparable 
to the others, served. Their average gains from Trial 1 to Trial 4 
in Sessions 1 to 8 were respectively 0.4, 2.1, 3.2, 1.6, 1.8, 1.8, 2.6, 2.1, 
and 1.2, averaging 1.9 + a standard error of 0.2. This is sixty-eight 
per cent of the gain from a reward of 0.1 cent actually given at the 
time, and fifty-six per cent of the gain from a reward of 0.2, 0.4, or 
0.8 cent given at the time. 





rew 
of : 
sub 
the 


—__itt el lil 


pro 
me! 


pay 
figu 
clic 
kee 
fou 
all 

hin 
nul 
cor 








of 
of 


| Trial 4 | ” | 


® 
| 


Influence of Reward upon Learning 405 


EXPERIMENT C 


The same type of experimentation as that of Experiment A with 
rewards of 1, 2, 4, and 8, was carried on with the learning of which part 
of a figure (such as those shown in Figs. 1 and 2) was right. The 
subjects wore glasses so that the letters designating ‘‘Right” to 
the experimenter and printed in red could not be seen by them. The 

























































































































































































bead eadecd | 
: V4 Po LY’ VAY 
Spex] iA 3 
>\ Yo PA AAa A . Y Je\, FR) 

Q\Al “\P\ -. A 


procedure was as shown by the instructions followed by the experi- 
menter, which were: 


Instructions for Experiment C 


Get the subject in a comfortable position. Give him a sheet of 
paper and a pencil. Start the timer and let him see if he can make the 
figure 1 or the letter O or the letter C or the letter X in time with its 
clicks. Put on his glasses and see if he can still make the letter X to 
keep up with the machine. When he seems to understand that, put 
four copies of the test before him in the A position (check to see that 
all are correctly arranged before you put them before him). Have 
him write his name in the upper right hand corner of each one and 
number them 1-1, 1-2, 1-3, 1-4. (Check again to see that they are 
correctly arranged and numbered.) 








406 The Journal of Educational Psychology 


Then say: ‘‘ You see this sheet of pictures. Each one is divided 
into six parts. When I say ‘Go,’ you are to mark an X in some one 
of the six parts of each picture beginning here (showing the upper left 
hand corner) and going across the page. The game is for you to 
make the mark in the right part. When you get one right, I will say 
‘Right,’ and when you get one wrong, I will say ‘Wrong.’ You must 
do it fast enough to keep up with the clicks on this timing machine. 
As soon as you finish one sheet, turn it over and do the next one. 
We will see how many you can learn to do right in four trials. Ready, 
Go!” 

On a separate piece of scratch paper keep tally of the number of 
“Rights” on each sheet. 

If necessary, have him go faster or slower to keep up with the timer. 
Be sure to say ‘‘ Right’”’ or ‘‘ Wrong”’ as soon as he makes his mark and 
do not let him lag to fix one in mind after you say “Right.” 

As soon as the four are finished, take the sheets away. Count your 
tallies and tell him his score. 

Then put four sheets in position B. Have him sign them in the 
upper right hand corner and number them 2-1, 2-2, 2-3, 2-4. (Check 
them for position, signing and numbering.) The rest period thus used 
should be three minutes. 

Say: ‘This game is just like the other one except that this time 
you are to make the letter C instead of X. Ready, Go!” 

Keep tally as before; remove the four sheets when finished, and 
report the number right. Encourage the child by saying that he is 
doing very well and that you hope he will do still better next time. 
Put four sheets in position C, have them signed and numbered 3-1, 
3-2, 3-3, 3-4. (Check them for correct position, signing and number- 
ing.) This rest period should be three minutes. 

Say: ‘‘The game is the same except this time make the letter 0 
instead of C or X. Ready, Go!” 

Again keep tally; remove the sheets when finished, and report 
the number right. Put four sheets in position D, have them signed 
and numbered 4-1, 4-2, 4-3, 4-4. (Check them for correct position.) 
The rest period should be three minutes. 

Say: ‘‘This is just like the others except that this time you are 
to make the figure 1. Ready, Go!” 

Keep tally, remove sheets, report number right, and pay the 
bonus. 





the 
wo 
C07 
mi 
to 


rey 








ur 


he 
ck 
ed 


ne 


nd 

is 
ne. 
a. 
er- 


ort 


ed 


are 


the 


Influence of Reward upon Learning 407 


N.B.—At no time let the child see the sheets without his glasses. 
If he takes them off during the rest period see that they are on again 
before placing the new sheets before him. 

The timer gave a click every four seconds. The money payments 
were not given to the subjects until the end of a session, but they 
understood clearly how much they would receive for each right 
response. The Programs were as follows: Group V used the set of 
square diagrams with rewards of 1 in session 1 (four trials in Position 
A), 2 in session 2 (four trials in Position B), 4 in session 3 (four trials 
in Position C) and 8 in session 4 (four trials in Position D). Group V 
had this same program for the set of long diagrams. Group VI had 
rewards of 8 where Group I had 1, 4 where Group I had 2, 2 where 


Group I had 4, and 1 where Group I had 8. The results appear in 
Table II. 


Taste II.—Tue AveracGe NuMBER OF CorRRECT RESPONSES OUT OF TWENTY 
Mave By Two Groups, Each ContTaIninGc TEN Boys, 1n Eacu or Four 
TRIALS AT CHOOSING THE RicguT Part oF a Picture, FoR Eacu oF 
Twenty Pictures WHEN THE Rewarp or “Rigut”’ Was 
SUPPLEMENTED BY A PROMISE OF ONE-, Two-, Four-, oR 
EIGHT-TENTHS OF A CENT 





















































Reward of 1 Reward of 2 Reward of 4 Reward of 8 
Sle nl ole] Gl ael atoll ae] SG) a] al ol ew] al a|l al] ol | 
S —_— [md — -_ 2 — — —_ — Cc -_ —_— — —_ c — —_ med — 
‘olie|]ea/] 38] 24 ae} @|a@/]a/‘s| @| o@} &@] a] ‘al @| | 814 
% “he | “ba | be | “be 2 Be) eB) ee] 8) ee) EL eR) 8B] eR] eee 
MD) be) & | ele) oe Oe Oe Oe] ee ee eI ed el ee 
Group V 
BQuAPO. «scces 1 |2.4/3.3)3.6)4.1) 2 |2.6)3.3/3.8/3.8) 3 |4.2)4.5|5.3\5.3)] 4 |4.1/4.1/4.2/4.6 
Pas 6 eats 5 13.8/4.5/3.7|5.3) 6 |4.6/4.9/5.115.7) 7 |5.4/6.0/6.6/6.8) 8 |5.115.0/4.7/5.6 
Group VI 
Square........ 4 |3.413.7/3.6/3.6) 3 |3.6)/5.0/4.0/4.5) 2 |3.0)3.7/3.4/4.0) 1 [3.1|4.9/4.9/4.1 
Pere 8 |4.5)5.5)4.1/5.2) 7 |5.9|5.9/6.5|6.1) 6 |4.0)6.4/6.4'6.4) 5 |4.4/5.0/5.7|7.0 
Average......... .. 18.5)/4.2/3.7/4.5) .. an ee Oh .. (4.1)5.1)5.4/5.6] .. |4.2/4.7/4.915.3 









































Unfortunately, with the set of long diagrams, the red ink used by 
the printer for the key number was visible through the red glasses 
worn by the subjects as slight streaks. Some of the subjects dis- 
covered this and the learning in the case of the “‘long”’ sets is thus a 
mixture of learning by the announcements of “ Right” and learning 
to get direct visual hints. 

The average gains from Trial 1 to Trial 4 with the square set with 
rewards of 1, 2, 4, and 8, are, respectively, 0.95, 1.05, 1.05, and 0.75. 


Sig a Fates eager. 


= 








408 The Journal of Educational Psychology 


Those for the long set are respectively 1.1, 0.65, 1.9 and 1.65. The 
averages for the two are 1.0, 0.9, 1.5 and 1.2. 

These results show the same failure of the 8 reward to be as effective 
as the 4 reward that was found in Experiment A. 


EXPERIMENT D 


Experiment D was in learning to connect a number from one to six 
with a word spoken by the experimenter. He used lists of twenty 
words, four trials of each, the words being said at a rate a little 
slower than that used in the experiment in marking the diagrams. 
The word lists were varied from subject to subject to avoid the 
possibility of any subject learning any of the numbers from other 
subjects. As in Experiment C, the money rewards were not actually 
given at the time. The results appear in Table III. 


TaB.eE III].—TxHe Averace NuMBER OF CoRRECT RESPONSES OUT OF TWENTY 
Mapg spy Two Groups Eacn ContTaInInc TEN Boys 1n Eacu or Four 
TRIALS AT CHOOSING THE RicHt NUMBER OUT OF Srx FoR EACH OF 
TWENTY ENGLIsH Worps, WHEN THE ReEwarp oF “Rigut” Was 
SUPPLEMENTED BY ONE-, Two-, Four-, on EIGHT-TENTHS OF A 









































CENT 

Reward of 1 Reward of 2 Reward of 3 Reward of 8 

OH edt bal badd Badd BB feo Bode dt Beat BE fend bad be be EO Bed finde endl Bo 

n e a ae 3 a 3 -] 3a Dn a a 3 3a a a a 3 

S/EUEIE SEPSIS EE EEE LEE | SEE EE 

DIR |B IR |B IDRIS |B IB Je ln lie lela le) Ble le le 

Group VII...... 1 me ee 2 i ee 3 alia die se al 4 |3.5/3.9|4.2/4.6 
Group VIII..... 4 |2.6/3.9.4.3/4.9] 3 4.014.114.4564 2 3.1]4.1/4.9/5.0 1 |3.3/3.6/4.3/5.5 
































TasieE IITA.—Same As THE ABOVE, BUT FoR ONE Group oF TEN Boys, LEARNING 
DURING 8 SESSIONS 















































Reward of 1 Reward of 2 Reward of 3 Reward of 8 
Sisisieizigicicl=i=isi-icisieigici=|=|: 
sai@aila@\l\aleilialS@lGilSlSlelSlselela@lwlalalsaia 
E: PIFIEIEL SIE LELELELSIELEILELELSIELELE lS 

ol Ecol oll he ll le oe ee 
Group IX....... 1 al 8|5.0} 2 |3.2/4 te 6.7) 3 ie 6\4.8/5.0] 4 |2.9/4.8/5.9/5.4 
Group IX....... 8 on a 8 1\4.8) 7 |3.5)4 0}4. 5/6 3} 6 |3.9|5.4/4.8/6.1] 5 13.1/4.3/4.6/4.8 





























bot! 
the 


wer 
gro 
wor 
rest 
Tri: 
ave 
2.1: 


cen 
ave 


The 
in f 
1.07 
oce 


27 


rior 
abo 


of 1 
the 
me! 
tior 








- OS # 


r 


| 444084 3 


om 





Influence of Reward upon Learning 409 


Ten boys (Group VII) who had rewards of 1, 2, 4, and 8 in sessions 
1, 2, 3, and 4 showed average gains from Trial 1 to Trial 4 of 0.4, 0.9, 
0.8 and 1.1 for rewards 1, 2, 4, and 8 respectively. 

Ten boys (Group VIII) who had rewards of 8, 4, 2, and 1 in sessions 
1, 2, 3, and 4, showed average gains from Trial 1 to Trial 4 of 2.3, 1.4, 
1.9 and 2.2 for rewards 1, 2, 4, and 8 respectively. The averages for 
both groups were 1.35, 1.15, 1.35, and 1.65. Here for the first time 
the reward of 8 seems the most advantageous of all. 


EXPERIMENT E 


Experiment E was the same as Experiment D except that the tokens 
were actually given to the learner at the time asin Experiment A. A 
group of ten boys had eight sessions of four trials each with twenty 
words, the rewards being 1, 2, 4, 8, 8, 4, 2, and 1, in that order. The 
results appear in Table IIIA. The average gains from Trial 1 to 
Trial 4 were, in order, 1.9, 3.5, 1.9, 2.6, 1.7, 2.2, 1.8 and 2.7. The 
average gains for rewards of 1, 2, 4, and 8 were 1.35, 2.15, 2.05, and 
2.15. 

If we express the gains due to “ Right’”’ + 0.1 cent, “Right” + 0.2 
cent, ‘Right’ + 0.4 cent, and “‘Right’”’ + 0.8 eent as per cents of the 
average for all four in the experiment in question, we have the following: 


ND Re , ceccdkerswencveces .87, 1.02, 1.11, and 1.02 
ins eecceuecnshan see .88, .71, 1.33, and 1.06 
SE Ss occnsd cece ceases .98, .83, .98, and 1.20 
cb ieObGn scene pccdddekecdan .70, 1.11, 1.06, and 1.11 


The unweighted averages are .86, .92, 1.12, and 1.10. Using weights 
in proportion to the number of occurrences, we have .86, .92, 1.15 and 
1.07. Using weights in proportion to the square root of the number of 
occurrences, we have .85, .92, 1.14, and 1.08. 

Right + 0.4 cent has a superiority over Right + 0.1 cent of about 
.27 with a probable error of about .07, and a superiority over Right + 
0.2 cent of about 0:22 with a probable error of about 0.10, and a supe- 
riority over Right + 0.8 cent of about 0.02 with a probable error of 
about 0.07. 

We can then be fairly certain that the increase of the money part 
of the reward acts to increase the rate of learning up to 0.4 cent, but 
the increase to 0.8 cent seems to do as much or more harm (by excite- 
ment or otherwise) as it does good by arousing and maintaining atten- 
tion to the work, satisfaction at success, etc. 


iy 
t p 


en el 
: = 


Se ee ee ee = - 
oe eo a = Sees, 
P es 


oo feeb ae > =. 


} 
’ 
7 
‘ bi 3 
i 4 


ha 
: 


410 The Journal of Educational Psychology 


It is fairly certain that the giving of even 0.1 cent per success at 
the time increased the learning over the mere announcement of 
“Right” plus a vague promise of “‘a little money”’ for doing the whole 
experiment. By Experiments A and B the gains from 0, 1, 2, 4, and 
8 were in the proportions 58, 86, 102, 111, and 102. 

It is certain that the learning is not proportional to the money 
reward. Quadrupling the latter adds probably less than a third (and 
surely less than two-thirds) to the latter, in the case of 0.1 and 0.4. 
Quadrupling the money from 0.2 to 0.8 adds probably less than a fifth 
to the gain, and surely less than half. 

It is fairly certain that a difference in money rewards acting as 
here throughout a period of several minutes is more influential than 
the same difference acting only from moment to moment as in the 
experiments of Rock.! In Rock’s experiments correct responses to 
any of certain words in the list were rewarded by Right + 1 unit of 
money; correct responses to certain other words in the list were 
rewarded by Right + 2 units of money; correct responses to still others 
by Right + 4 units of money, and correct responses to still others by 
“Right” alone. The gains for these different groups of words differed 
in the direction of greater gain for greater money supplement, but the 
differences were small and not very reliable. Our materials and sub- 
jects were not strictly comparable to Rock’s, but the differences in 
these respects should not be very influential. 

The greater gain in these compared with that in his experiments is 
what should be expected. The two sets of experiments differ funda- 
mentally. In Rock’s, the influence measured is limited to the immedi- 
ate strengthening of a particular connection by rewards of varying 
magnitudes. In these, the influence measured is that plus also the 
strengthening of other neighboring right connections, plus also the 
strengthening of any right connections in the series by greater interest, 
attention and any other factors caused by the difference in rewards and 
pervading more or less of the learning of the series. 

So much for the general question concerning the influence of vary- 
ing rewards. We now turn to the question of the influence of shifts up 
and down in the amount of the money supplement. These have been 
nearly equalized in the measurements of the differences so far discussed. 

We now list the differences in average gains as the subjects were 
shifted from Right + 0.1 to Right + 0.2, from Right + 0.2 to Right + 
0.1, from Right + 0.2 to Right + 0.4, from Right + 0.4 to Right + 0.2, 


1 As yet unpublished. 








etc. 
wit 
diffe 
app 
dow 
So t 


TaB 








at 
of 
le 
id 





Influence of Reward upon Learning 411 


etc. We compare the changes from shifts up and from shifts down 
with each other and with the general differences due to corresponding 
differences in magnitude of the money supplement. The results 
appearin TableIV. At their face value they show shifts up and shifts 
down to be nearly equally potent. But the results are very variable. 
So the general result needs to be tested by further experiments. 


TaBLe IV.—CHANGE IN AVERAGE GAINS FROM TRIAL 1 TO TRIAL 4 ACCOMPANYING 
Eacu CHANGE IN THE AMOUNT OF THE MONEY REWARD 











Changes in amount of money reward 

Group 

1-2 | 2-1 | 2-4 | 42 | 48 |; 84/ 1-8 | 81] 1-1 | 22] 88 
I 0.7|—0.2} 2.2) 0.5|—1.6)—0.9].....]..... as Deane 0.9 
II —0.9|—1.6)—0.1/—2.3) 0.1) 2.5).....]..... 2.4 
III ee ee eee 0.7|—0.4)/—0.3)—1.8 —0.2 
V —0.5)..... —0.1]..... ee ee 1.0 

—0.4)..... - —0.9 
=.  Brecsa Vr —0.1)..... 0.0 

ee —2.2)..... —0.2} 2.4 

VII a a See 0.3 
a See in —0.5)..... —0.3 
IX 1.6;—1.0|;—1.6)—0.4) 0.6) O.5).....].....]....-]e.ee. —0.8 






































oe et wn Soerenioe 


~—S- 


LM i! i i 
Rs 
ong 
al 
ie 


| 
{ 
Z 
d 


THE INFLUENCE OF READING ABILITY ON 
INTELLIGENCE MEASURES 


DONALD D. DURRELL! 


Boston University 


With few exceptions, group intelligence tests in common use have 
a preponderance of items demanding reading ability. Measures 
of intelligence should be largely independent of variable environmental 
factors. Makers of intelligence tests have apparently assumed that in 
a system of compulsory education all children have had equal oppor- 
tunity to learn to read and that achievement in reading is in proportion 
to the native intellectual ability of the child. It is the purpose of 
this study to determine whether such assumptions are justified and 
whether reading ability affects intelligence test scores significantly. 

During the year 1930, Stanford-Binet tests were given to all 
children of the Harvard Growth Study. While all of the examiners 
giving the Stanford-Binet were experienced in the use of the test, the 
uniformity of administration and scoring was checked by the writer. 
‘ Since the Stanford-Binet is relatively independent of items demanding 
reading ability and since it is often used as a criterion for determining 
the validity of group intelligence tests, it is used in this study as the 
basic measure of intelligence. Reading ability was determined by 
Stanford Achievement Reading Test, Form B, the Chapman-Cook 
Speed of Reading Test, and the Burgess Silent Reading Test. All 
children were in their sixth school year. 

Of the one thousand one hundred thirty children with complete 
records on all tests, 28.7 per cent were found to have reading ages one 
year or more above their Stanford-Binet mental ages, 15.2 per cent had 
reading ages one year or more below their Stanford-Binet mental 
ages, and the remainder, 56.1 per cent, obtained reading ages within 
one year of their respective mental ages. {These data would indicate 
that reading ability and mental ability, determined by the tests 
used, do not show equal growth for a large proportion of the children 
studied. The fact that children have spent equal amounts of time in 
school does not assure that they have made gains in school subjects 
in exact proportion to their mental ability. 





1 This study was made as a part of a doctor’s dissertation at Harvard University 
under the direction of Dr. W. F. Dearborn. (Unpublished 1930.) 
412 





H 
cities 
of se 
invol 
no re 
on F 
Stan 
read 
read. 


TABL 


—_— 





Binet 
Hagg 





child 


who 
The 


TABI 


Bine 
Hag; 


IQ’s 


yea 


sup 
wit] 
mer 
wou 


IQ’ 











Reading Ability and Intelligence Measures 413 


Haggerty Intelligence Examination Delta II was used in two 
cities where the study was made. The Haggerty Delta II consists 
of seven tests; six tests with a total of one hundred seventy-six items 
involving reading ability, and one test of twenty items which demands 
no reading ability. In order to determine the effect of reading ability 
on Haggerty Delta II, the Haggerty 1Q’s were compared with the 
Stanford-Binet 1Q’s of children with different levels of achievément in 
reading. ‘Table I shows such a comparison for children who were 
reading one year or more above their Stanford-Binet mental ages. 


TaBLeE I.—Binet IQ’s CompareD TO HaaGerty IQ’s or Two HunpRED TWENTY- 
FOUR CHILDREN WITH SUPERIOR READING ACCOMPLISHMENT 





Mean PE, | Difference} PE gir. Difference 
PE gist. 





tl a ss stan aaet-s 92.09 .74 16.83 1.19 14.2 
Hammetty BO... . «..s000 108 .92 .94 




















The mean Haggerty IQ is 16.8 points above the mean Binet IQ for these 
children. 


A similar comparison was made for a group of thirty-six children 
who were reading two or more years above their Binet mental ages. 
The result is shown in Table II. 


Taste II].—Binet IQ’s ComparepD with Haacgerty I[Q’s or THIRTY-SIX 
CHILDREN WITH VERY SuPERIOR READING ACCOMPLISHMENT 








Mean PE, | Difference} PE girs. Difference 
PEaict. 
4G 93.7 1.82 20.8 2.92 7.13 
Haggerty 10.........00. 114.5 2.30 




















Table III shows a comparison of the Haggerty 1Q’s with the Binet 
IQ’s of a group of one hundred children who had reading ages a 
year or more below their Binet mental ages. 

It may be well to observe at this point that the children with 
superior reading accomplishment have lower IQ’s than the children 
with low reading accomplishment.{ Almost every study of accomplish- 
ment quotients shows that children with low IQ’s achieve more than 
would be expected for their mental ages, and that children with high 
1Q’s usually fail to make educational ages up to their mental ages. 


eer 
= 


weber ss 


414 


The Journal of Educational Psychology 


Taste III.—Binet IQ’s Comparep wita Haaaerty IQ’s or One Hunprep 
CHILDREN witTH Low READING ACCOMPLISHMENT 








Mean PE, | Difference} PE gir. Difference 
PE gaits. 
re 106.7 1.38 8.1 1.75 4.63 
Haggerty IQ............ 98 .6 1.15 




















The mean Binet IQ is 8.1 points lower than the Haggerty IQ for these 
children. 


This accounts for the differences in Binet 1Q’s between the children 
with superior reading accomplishment quotients and those with poor 
reading accomplishment quotients. 

It is evident from the three tables above that the IQ’s derived from 
the Haggerty Intelligence Examination go higher or lower than the 
Binet IQ’s as the reading accomplishment is higher or lower. It 
appears that the reading achievement of an individual might make a 
difference of at least twenty-nine Haggerty IQ points—the difference 
between the most superior and the inferior reading group. When 
it is recalled that the reading age of 43.9 per cent of the children studied 
fell outside a range of one year from their Stanford-Binet mental ages, 
it will be seen that the Haggerty Delta II IQ will often show a marked 
divergence from the Binet IQ. A summary of the differences from the 
tables above is found in Table IV. 


Taste IV.—Binet IQ’s Comparep witnh Haaaerty IQ’s ror Groups OF 
CHILDREN WITH DIFFERENT READING ACCOMPLISHMENT 








Mean Haggerty . 
Reading in relation to Binet mental age * ember IQ minus mean Diierence 
in group Binet 1Q PEaise. 
Two years or more advanced.......... 36 20.8 7.13 
One year or more advanced........... 224 14.8 14.20 
One year or more retarded............ 100 — 8.1 4.63 
Within one year (normal)............. 449 2.4 2.89 

















The Otis Self-Administering Intermediate Examination contains 
seventy-five items, all of which require reading. It was given to 
three hundred fifty-seven children who had previously had the Stan- 
ford-Binet and the reading tests listed above. 


Table V shows 4 











ns 
to 





Reading Ability and Intelligence Measures 415 


comparison of the Otis IQ with the Binet IQ of one hundred children 
who were reading one year or more above their Binet mental ages. 


TaBLE V.—Stanrorp-Binet IQ’s ComparRep witH Otis SEeLF-ADMINISTERING 
IQ’s or ONE HUNDRED CHILDREN WITH SUPERIOR READING ACCOMPLISHMENT 





Mean | PE,, | Difference| PEa, | -iterence 
PEagitt. 





aonb sox ve noe 96.9 | 1.07 10.0 1.77 5.65 
a eee a) h !hUrae 




















Table VI shows the difference between the mean Binet IQ and the 
mean Otis IQ of a group of seventy-one children who were reading one 
year or more below their Binet mental ages. 


Taste VI.—Binet IQ’s ComMPaRED wWITH OTIS SELF-ADMINISTERING IQ’s oF 
SEVENTY-ONE CHILDREN WITH Low READING ACCOMPLISHMENT 








Mean PE, | Difference} PE gir. — 
i Sab no ned eae wes 105.5 1.30 5.6 2.05 2.75 
GE Mcttctenescaenend 99.9 1.59 




















While the difference between the Stanford-Binet and the Otis is 
less in both groups than between the Stanford-Binet and the Haggerty 
for similar groups, the tendency for the Otis to follow the reading 
achievement is marked. A summary of the data regarding the Otis 
test appears in Table VII. 


TasBLe VII.—Sranvorp-Binet IQ’s Comparep with Otis Se_F-ADMINISTERING 
IQ’s ror Groups oF CHILDREN WITH DIFFERENT READING ACCOMPLISHMENT 

















Mean Otis IQ ' 
— ; , Number ; Difference 
Reading in relation to Binet mental age | ; minus mean 
in group Binet IQ PE ait. 
One year or more advanced........... 100 10.0 5.65 
One year or more retarded............ 71 — 5.6 2.75 
Within one year (normal)............. 186 4.4 3.43 





The IQ’s from the group tests studied appear to vary to a significant 
degree with the reading accomplishment of the group examined. Since 
school success depends to a large extent on reading ability, the presence 


—_—— 


- 





416 The Journal of Educational Psychology 


of reading items in a test will not necessarily invalidate it as a measure 
for the prediction of school success if the reading accomplishment of 
the child is relatively constant. Yet it is true that the presence of 
this large factor of reading in intelligence tests will allow many children 
to be classed as dull who are really normal or bright but who have 
poor reading ability. It follows that the group intelligence test involv- 
ing a great number of reading items should not be used as a basis for 
intelligence or accomplishment quotients. It appears to be a reading 
test incorrectly labeled. 


Thes 
score 
arith 
prop 
curre 
ask 

exist 
valu 
then 


recel 
appl 
usag 
simp 
tests 


pone 
the § 
latio 
shall 


The 
com) 


and 1 


for t] 
whic! 
Hull, 
treat 








\w —_ mew OS Ne 





ANALYSIS OF A COMPLEX OF STATISTICAL 
VARIABLES INTO PRINCIPAL COMPONENTS! 


HAROLD HOTELLING 
Columbia University 
1. INTRODUCTION 


Consider n variables attaching to each individual of a population. 
These statistical variables 7;, x2, ..., 2, might for example be 
scores made by school children in tests of speed and skill in solving 
arithmetical problems or in reading; or they might be various physical 
properties of telephone poles, or the rates of exchange among various 
currencies. The z’s will ordinarily be correlated. It is natural to 
ask whether some more fundamental set of independent variables 
exists, perhaps fewer in number than the z’s, which determine the 


values the z’s will take. If y1, v2, . . . are such variables, we shall 
then have a set of relations of the form 
y= filv1, Y2 + > a (2 _ 1, 2, ie 64 n) (1) 


Quantities such as the y’s have been called mental factors in 
recent psychological literature. However in view of the prospect of 
application of these ideas outside of psychology, and the conflicting 
usage attaching to the word ‘‘factor”’ in mathematics, it will be better 
simply to call the y’s components of the complex depicted by the 
tests. 

We shall consider only normally distributed systems of com- 
ponents having zero correlations and unit variances. If we use 
the symbol E to denote the expectation, or mean value in the popu- 
lation, of the quantity following it, the condition that the means 
shall be zero is expressed by 


Ey; = 0. 


The assumptions of unit variances and zero correlations may be 
combined in the statement 





1A study made in part under the auspices of the Unitary Traits Committee 
and the Carnegie Corporation. 

The author is indebted to Professor Truman L. Kelley, who was responsible 
for the initiation of this study and the propounding of many of the questions to 
which answers are here attempted; also to Professors L. L. Thurstone, Clark V. 
Hull, C. Spearman, and E. L. Thorndike, who raised some of the further questions 
treated. 


417 


if : 
we 
Ka] 
5 ae 
1 eh 
\'¥ 
Het 
Pt 


7 =a. 
a RET 


- 





5 
* 
= 
~ * 
: 


418 The Journal of Educational Psychology 


Eva; = 4; (2) 


where 6,;, the so-called Kronecker delta, equals unity if 7 equals j, 
zero if they are unequal. 

If, following the notation of T. L. Kelley, we express the z’s in 
‘standard measures,” by taking the deviation of each from its mean 
value and dividing by its standard deviation, we obtain a set of 
quantities z,, z2, ..., 2, for which our formulas will be simpler. 
Confining ourselves to the case in which the functions f; are linear, 
the equations (1) then take the form 


25 = 2jAisVi; (3) 


constant terms disappearing because both the z’s and y’s have zero 
means. The summation will be taken from 1 to n; this will include 
as special cases situations in which there are fewer components 
than tests, since some of the a;;’s may be zero. However we shall 
assume in what immediately follows that this is not the case, and 
that the determinant a of the a,;’s is not zero. 

We shall make use of the tensor analysis convention that the 
repetition of a literal subscript in a term shall, unless otherwise 
explicitly indicated, denote summation with respect to that subscript 
from 1lton. This not only saves writing a large number of summation 
signs, but has a mnemonic value in helping to indicate what to do 
next. According to this convention we write (3) in the form: 


25 = Aisi. (3) 

Let A,; denote the cofactor of a;; in a, divided by a. Then by 
the elementary theory of determinants, 

Qij;Au = di, a;;Anj; = Sx. (4) 


We may solve (3) for the y’s by multiplying both sides by Au, 
summing with respect to 7 from 1 to n, and using (4). Since dj; 
is a sum consisting of terms which all vanish except yx, this gives: 


% = A Zi. (5) 
Let r% be the correlation between z; and z;, equal to unity if + = k. 
This is the same as the correlation between z; and z,; and 
Ti = Ezy. 
Here substitute the value for z; given by (3), and for 2 an expression 
obtained from (3) by replacing 7 by k, and j by I. With the help of 
(2) we then obtain: 


The: 
ai; v 
of u 
obse 
com) 
rigid 


the 

othe 
that 
pone 
norn 
func 
info) 
tion 
such 
get : 
and 

dist1 


cons 
SUPT 
setti 
resu! 
tions 
arbit 
even 


diffe 


obta 


Whe 
scrir 
sign. 








i 


on 
of 


Analysis of a Complex of Statistical Variables 419 
Tie = Ain By yi = Aijdnd jn (6) 
= Aj jAx;j- 


Since rx. = rx; the number of equations (6) is only 44n(n + 1). 
They are therefore insufficient for determining the n? quantities 
a;; when the correlations between the tests are known. Thus systems 
of uncorrelated components y may be chosen, consistently with the 
observed correlations, in «%""-) ways. This variety of choices of 
components corresponds to the }gn(n — 1) degrees of freedom of a 
rigid rotation in a space of n dimensions. 

It might be thought that additional equations for determining 
the a;; could be obtained with the moments of higher order, or of 
other parameters of the population. But if we retain our assumption 
that the z’s are linearly compounded of normally distributed com- 
ponents, this is not the case. Indeed, the z’s then have a multivariate 
normal distribution; and every parameter of such a distribution is a 
function of the means, variances, and covariances, whose available 
information is fully embodied in the equations (6) and in the assump- 
tion of standard measures. If for example we multiply together four 
such equations as (3) and take the mean value of each side so as to 
get an equation in the a;;’s, this equation will, with the help of (6) 
and the expressions for the fourth moments of a multiple normal 
distribution, reduce to an identity. 

Various modes of escape from the indeterminateness have been 
considered. The number n? of unknowns a;; may be reduced by 
supposing that there are fewer than n components, which amounts to 
setting some of the a;; equal to zero. If carried far enough, this 
results in fewer equations than unknowns, so that consistency condi- 
tions upon 7;; may be obtained. A similar situation arises from other 
arbitrary specializations of the a;;, the number of components possibly 
even exceeding the number of tests. Thus Spearman, putting (in 
different notation) 


25 = AieYo + ini, (7) 
obtained as consistency conditions the famous tetrad equations, 


Tie — Tar = O. (8) 


When these are satisfied for every set of different values of the sub- 
Scripts, the a;; appearing in (7) are determined uniquely except for 
sign. Systems involving less specialization and leading to different 


mie 
t | 





4 


Sn ew <0 


te ptnas «Spies 5 


oe 


: 
= 
ae 


420 The Journal of Educational Psychology 


and fewer consistency conditions have been considered by Truman L. 
Kelley in ‘‘ Crossroads in the Mind of Man.”’! 

The consistency conditions are of course never satisfied exactly 
in asample. Whether the extent of their non-fulfillment in a sample 
of given size is sufficient to render incredible their fulfillment in the 
population depends not only on the standard of credibility adopted, 
but also on the solution of mathematical problems whose study is still 
incomplete. Apart from this question of sampling, it may well be 
argued that it is unlikely that the conditions should be fulfilled exactly 
in the population, and that for sufficiently large samples tetrads such 
as the left member of (8) may confidently be expected to exceed any 
assigned multiple of their probable errors, just as the correlation 
between any two mental or physical measurements is not likely to be 
exactly zero. This argument is not necessarily conclusive, since small 
tetrads, even in the infinite population, may like very small correla- 
tions be treated as negligible, for economy of thought. It does, how- 
ever, bring out the special character of the assumption that the number 
of components is less than the number of tests, as well as of other 
simplifying particularisations of the a;;. 

In order to go as far as may reasonably be possible in a given case 
in expressing the test scores z; in terms of a smaller number of com- 
ponents, an orderly procedure is required for selecting the components 
in the order of the definiteness of their existence, or of their importance 
for our purposes, and rejecting any which prove to be of little impor- 
tance, or which are not clearly defined by the data. An analogous 
situation arises in fitting empirical curves. A series of the form 


y=a+bze+cz*'+-:-> 


may be fitted, the number of terms used being limited by the increas- 
ing probable errors of the coefficients of higher order, and also by the 
diminishing contributions to the total variance of y by these higher 
order terms. If the series is modified so as to consist of orthogonal 
functions, the successive coefficients have zero intercorrelations. Only 
those terms should be retained which are significant. Another analogy 
is the use of regression equations involving more and more variables 
Zi, Zz, X3, . . . to explain or predict y, these being chosen in the order 
of their contributions to the variance of y. 

These analogies suggest that, in choosing among the infinity of 
possible modes of resolution of our variables into components, we 





1 Stanford University Press, 1928. 





T1 hs 
inde 
grea 
com 
who 
call 

side! 


sion: 
the 

are | 
of p 
of ec 


of u 
sque 
pone 
of n 
form 
axes 
dista 
a set 
cipa 
pur} 
For 
com 
ent 
uniq 
nom 
be o 
plem 
prin 
sum 
unit; 


ure, 
The 
case 
dete 








> NE i i ee <i 


| epnat 
_— 


_— 
' 


j= 





Analysis of a Complex of Statistical Variables 421 


begin with a component 7; whose contributions to the variances of the 
z, have as great a total as possible; that we next take a component 72, 
independent of 71, whose contribution to the residual variance is as 
great as possible; and that we proceed in this way to determine the 
components, not exceeding n in number, and perhaps neglecting those 
whose contributions to the total variance are small. This we shall 
call the method of principal components. Its technique will be con- 
sidered in the subsequent sections. 

If 21, 22, . . . 2, be taken as rectangular coordinates in n dimen- 
sions, each point represents a possible individual. If, as we assume, 
the population is normally distributed, the loci of uniform density 
are concentric, similar, and similarly placed ellipsoids. The method 
of principal components, we shall see, is equivalent to choosing a set 
of coordinate axes coinciding with the principal axes of these ellipsoids. 

Now since the set of z; is capable of transformations such as changes 
of units and other linear transformations, the ellipsoids may be 
squeezed and stretched in any way. The method of principal com- 
ponents can therefore be applied only if for each z; there exists a unit 
of measure of unique importance, and if, furthermore, linear trans- 
formations, or at least those which do not correspond to rotations of 
axes, are unimportant. In other words, a metric—a definition of 
distance—must be assumed in the n-dimensional space, and not simply 
a set of axes; we must use Euclidean, not affine geometry, if the prin- 
cipal axes of the ellipsoids are to possess significance. For various 
purposes it might well happen that different metrics would be suitable. 
For example the assumption that all the tests, and all the sets of 
components to be considered, shall have their chance errors independ- 
ent of those of the others in the set and of equal variance, provides a 
unique metric. Other possible metrics might be derived from eco- 
nomic considerations, as by requiring that the component traits shall 
be of equal market value per unit and must not compete with or com- 
plement each other. The particular metric implied by the method of 
principal components is based on the assumption that the unweighted 
sum of the variances, where the total variance ofeach test is taken as 
unity, is the essential quantity to be analyzed. 

Weights may in effect be introduced by changing the units of meas- 
ure, so as to make the standard deviations of the tests no longer unity. 
The correlations which appear in our subsequent work would in that 
case be replaced by covariances, and the 1’s in the diagonal of the 
determinant by the variances. However we shall not treat this 


ar 





ee 


tis 3 
ie 
je 


i 


' 
Hi 
; 


422 The Journal of Educational Psychology 


obvious generalization, excepting to discuss in Section 11 a possible 
criterion for suitable weighting. Analysis of the unweighted sum of 
variances has somewhat the same sort of validity as the use of an 
unweighted mean of observations when we do not know what the 
weights should be. 

A question bound to arise is whether the ensuing analysis should 
be applied to the “‘raw”’ correlations or to those corrected for attenua- 
tion. This is equivalent to the question whether the unit of measure 
in the n-space is to be the standard deviation of the true or of the 
observed scores. If the true scores’ standard deviations are to be used 
as units, the analysis must be based on the corrected correlations, with 
1’s in the diagonal. This seems for some purposes a reasonable pro- 
cedure, and is exemplified in Section 5, p. 432, to which the reader 
may now pass directly if he is interested in learning the method rather 
than initstheory. Ifon the other hand the standard deviations of the 
inexact observed scores are taken as units, the analysis must be per- 
formed upon a matrix having the reliability coefficients in the principal 
diagonal, with the raw correlations elsewhere. An advantage of this 
last method is that the relative influence of the more reliable tests upon 
the results is in general enhanced. 

An easily verified property of the method is that the first of our 
principal components has a greater mean square correlation with the 
tests than does any other variable; and that among all variables uncor- 
related with the first g — 1 principal components (q = 2,3, - - - , n), 
that having the greatest mean square correlation with the tests is the 
qth principal component. The argument is similar to that of the next 
section, and will not be given explicitly. 


2. DERIVATION OF THE METHOD 


Upon squaring each side of (3) and taking the mean value, it is 
evident that the variance of z; may be written 


On + att ++ + OG, 
and that the first term is correctly described as the contribution of 


1 to the variance of z;. The sum of the contributions of 7; to the 
variances of all the z’s is 


S =a? + a2 + °° + +71, 
which in our abbreviated notation may be written 
S = 441041. (9) 


Suk 


our 
ma) 


whe 


The 


hav 


The 
awe 
(13) 
fou 


anB, 


Qi; 
Con 


OF, | 


in t 








‘~ | ll tl 


ee ee ee eel 


— 


ir 
e 
l- 
), 
1e 
xt 


of 
he 


Analysis of a Complex of Statistical Variables 423 


Subject to (6), which we rewrite 
A; jn; = Tiky (6) 


our present object is to choose the coefficients a;; so as to make S a 
maximum. To this end we write 


2T =S — Aad; jAnj;, 
where the Aja(= Aas) are Lagrange multipliers. We put 


oT 
dda = ai Awa = 0, (10) 
oT 


dag 7 hat =O G 1) (11) 


These two sets of equations may be combined in the single form 


OT 


According to (11), the linear equations 


Anta = O (13) 
have the n — 1 solutions 
Qi2, Gee, - - » » Ane 
Qi3, G23, - - - » Ans 
Gin, en; ss 8 9 dan 


These must all be linearly independent, for otherwise the determinant 
a would vanish, contrary to hypothesis. Hence the rank of the system 
(13) is 1. Therefore quantities a:, a2, ... , Gn, Bi, - - - Bn, can be 
found such that }y, = a8. But since \;; = A,;, it follows that a8, = 
a,8;, so that the 6’s are proportional to the a’s. Hence we put 8; = 
EQi, and 

Nin = EOE (14) 
Consequently (10) may be written in the form 


Air = €A;AAAa1; 


k 
anari = ~ 


ay 


or, putting 


in the form 





Qa; = 


Te 





ee 


— Secon 


424 The Journal of Educational Psychology 


Substituting this in (14) and the result in (12) gives 
Qi1(k6; —_ An10p;) = (), 


Since the quantities a;, are not all to be zero, this leads to 


Qaida; — kb; = 0 (15) 
Multiply by a,,;, sum for 7, and use (6). The result is: 

Thmtni — Kdmi = 0. (16) 
Writing out these equations explicitly form = 1,2, - - - ,n, and drop- 


ping the subscript 1 from the unknowns aj): 


(1 — kay + riste ++ * + + 7inda = 0 
T3901 +. (1 o_ k)ae a ee ¢ oa TonQn = 0 (17) 


Tind1 + Tond2 +° ++ + (1 —k)a, = 0 


In order that these equations have solutions in which not all 
unknowns are zero, it is necessary and sufficient that 


1—kris > * oe 
S(k) = i721 1 —_ k yon * * * Fo => 0 (18) 
Trl Tn2 Tn3 1 = k 








Equations such as (18) were first studied in connection with the per- 
turbations of the planets, and are known as characteristic equations. 
The general theory of such equations! shows that all the roots are real, 
and that if there is a q-fold multiple root, the determinant in (18) has 
rank n — q when this root is substituted for k. 

If we substitute a simple (not a multiple) root of the characteristic 
equation for k in (17), we obtain a set of linear homogeneous equations 
of rank n — 1. These will have a family of solutions, all of which 
are proportional to one solution. The factor of proportionality is 
found by putting 7 = 1 in (15); this shows that the sum of squares, S, 
which we are trying to maximize, is just equal to one of the roots of 
the characteristic equation. Since it is to be a maximum, we must 
obviously use the greatest root. 

The problem of finding a component y; which will account for 
as large as possible a part of the total variance is thus solved by finding 





1Cf. for example G. Kowalewski, Einfiihrung in die Determinantentheorie, 
Ist ed., Leipzig, 1909, pp. 126 and 274. The proof of the reality of the roots was 
given by Cauchy in the Philosophical Magazine for 1852, and may be found in 
the treatment of quadric surfaces in books on solid analytic geometry. 





the | 
in (1 
and « 
ing 

in th 
com] 


hy, tl 
of tk 
inde} 


of (1 
to b 


The: 
pone 
vari 


cont 
mak 
prok 
the 

are \ 
beer 
valu 
in tl 
the « 
vari: 


post 
follo 
we | 


it is 
of t 





Analysis of a Complex of Statistical Variables 425 


the largest root k, of the characteristic equation (18), substituting 
in (17), finding any solution a;, a2, . . . , a, of these linear equations, 
and dividing these last values by the sum of their squares and multiply- 
ing by ~/k:. The resulting quantities are the coefficients of y; 
in the expression (3) giving the test scores in terms of the independent 
components. A simplified numerical method is given in §4 below. 

If the g largest roots of the characteristic equation are equal to 
k,, the determination of ai1, dei, . . , Gni is not unique, since the rank 
of the linear equations (17) is then only n-q. In this case q linearly 
independent solutions, 


Qii, Geir, - - « » Ani 
GQi2, Gea, . . - » Ane 
Aig, A2¢q; » Ong 


of (17), can be found. Moreover these solutions may be so chosen as 
to be ‘‘orthogonal”’ to each other, in the sense that 


They may then be taken as the coefficients of g independent com- 
ponents 71, ..., Yq, all of which contribute equally to the total 
variance. 

When the coefficients of the component 7; which makes the largest 
contribution to the total variance, or of the g components which equally 
make maximum contributions, have been determined, the next 
problem is to find a component making a maximum contribution to 
the residual portion of the variance. The argument and procedure 
are virtually the same as before. If just one component has previously 
been determined, the subscript z in (10), (11), (12), and (13) takes the 
values 2,3, . . . , n, and the subscript 1 is replaced by 2. Proceeding 
in this way we determine the coefficients of 1, y2, ys, . - - » Yn, in 
the order of the contributions of these components to the sum of the 
variances of the z’s. 

The orthogonality among principal components which we have 
postulated for multiple roots holds also for simple roots. This 
follows from (15), which, for j = 2, gives aaids2 = 0; and similarly 
we have for all unequal values of 7 and j, an;aa; = 0. 

Having thus derived the equations for the z’s in terms of the y’s, 
it is desirable to solve these, so as to be able to assign a value to each 
of these principal components y in terms of the test scores. This 


eS Ss 


1% } 
Ap 
+ P| 
Bil 
ip 


+ . eos - 


a 


se 
= 





ee ee 





{ 
z 
ws 


426 The Journal of Educational Psychology 


will make it possible to assign a value for each principal component 
to any individual upon whom the tests had been made. 

The solution is remarkably simple. 

In (15), k is the same as k,. For the 7th component we replace 
the subscript 1 in (15) by 7, and k by k;. This gives 


Gnida; = k,d;;, (not summed for 7) (19) 


which simply means that the sum of the products of corresponding 
elements of two different columns of the determinant a is zero, and 
that the sum of the squares of the elements of a column is the root 
of the characteristic equation corresponding to this column. 

Recalling that we have denoted the ratio of the cofactor of a,,; 
to a by Am;, we multiply both sides of (19) by Am; and sum for j. 
With the help of (4) this gives 


Ani = kKiA mi (not summed for 1) (20) 
Since (5) gives by a mere change of indices, 
74> A miZm; 


it follows that 
~ AmiZm 


Yi ki 





(not summed for ¢). (21) 


3. GEOMETRICAL MEANING 


Geometrically the foregoing procedure corresponds to rotating 
the rectangular axes of 21, z2, . .. , 2, 80 that the new coordinate 
axes lie along the principal axes of the ellipsoids of uniform density. 
The squares of the lengths of the principal axes of one of these ellipsoids 
are proportional to the k’s. These facts are not immediately obvious, 
but may be easily proved as follows. Let w be the determinant of the 
correlation coefficients r;;, and let 


cofactor of r;; in w 
Ry = Ry; = = ’ 


wW 





so that 
ris = 5 jx. (22) 


From the theory of multiple normal distribution, the ellipsoids of 
uniform density are given by 


R,;z:z; = constant. (23) 


The procedure developed in treatises on solid analytic geometry is to 
solve the equation 





to sl 
equa 


and 
corre 


that 


and | 
Yn al 


the 
in tr 


The 


the 

mul 
coef 
sam 











Analysis of a Complex of Statistical Variables 427 


Ri —r Ry ce * Rig 
Re Re. — Xr °° + Ren 


eaeewene Cese@eoes’eeseo@eseeoeeaeenene se 6 os & 


= 0, (24) 








to substitute the roots Ai, Ax, . . . , An in the homogeneous linear 
equations 


(Ri; — d4;;)1l; = 0, (25) 


and for each root to solve these equations. Calling the solution 
corresponding to A, 

Lit, loi, “ee Lat, 
that corresponding to Az 


lia, lee, nas g laa, 


and so on, the equations of rotation to new rectangular axes y;, . . . , 
Yn are 
25 = lis; (26) 
the solution of these equations for the y; has the same coefficients 
in transposed order, and runs: 
yi = Lizzi. 
The equation of the ellipsoid in the new coordinates is 
iyi? + Aeyo? + + + + + AnYn? = constant (27) 
Now multiply both sides of (24) by 


1 riz a? * * oe 
Tor 1 Te3 * * * Ten 








By (22), the result is 


1—A —Arie + —Ain 
—XAreor 1 —A-* + + —Aran 


—Aai —APaez -++]—X 








Upon dividing each row by —i, and setting k = 1/4, this reduces to 
the characteristic equation (18). If we also set k = 1/X in (25), 
multiply by ri, and sum for 7, the resulting equations have the same 
coefficients as (17). For simple roots the solutions are therefore the 
same, apart from a common factor. For multiple roots the solutions 


er ee. 


428 The Journal of Educational Psychology 


have in both cases the same type of indeterminateness. Since \; = 
1/k;, (27) becomes 


eS ae ee 
E+ E+ + 5 = constant, (28) 


which shows that the squares of the lengths of the axes are pro- 
portional to ki, ke, . . . , Kn. 

If, instead of the y’s or z’s, the y’s or other independent and equally 
variable quantities be taken as rectangular coordinates, the ellipsoids 
are squeezed and stretched into spheres. Each test is represented by a 
line through the origin. The correlation between two tests is the 
cosine of the angle between their lines. L. L. Thurstone has used 
coordinates equivalent to these.! 

Not only must the roots of the characteristic equation be real; 
they must all be positive. If there were a negative root, (28) would 
represent, not an ellipsoid, but a hyperboloid extending to infinity. 
Since the density of probability is to be uniform over this locus, the 
probability of a sample deviating in certain tests from the mean by 
more than any given amount would be infinite, which is absurd. 

According to the original Spearman theory ‘of the mind, one 
important general factor accounts for the bulk of the variance of 
mental tests, other components being of minor importance. If this 
is true, one of the roots of the characteristic equation should be much 
larger than any of the others. The ellipsoids should be needle-shaped. 
But if two or more of the roots are equal, there will be a corresponding 
number of independent components which contribute equally to 
the variance. In this case the ellipsoids will be figures of revolution. 
If n = 3 and the two largest roots are equal, while the third is very 
small, the ellipsoids will be thin discs. 

In distinguishing among such theories it is of course not the 
absolute values of the roots that is important, but their ratios, and 
particularly the ratios among the largest of the roots. The sum of 





1 Multiple Factor Analysis. Psychological Review, Vol. XXXVIII, 1931, pp. 
406—427. 

Since this was written Professor Thurstone has kindly sent me a pamphlet 
he has prepared for class use, in which he uses the same geometric interpretation 
as in the present section, and discusses the problem from essentially the same 
standpoint as that taken in §1. His iterative procedure appears to have no rela- 
tion to that of §4. In June, 1932, Professor Thurstone presented at the Syracuse 
meeting of the American Association for the Advancement of Science certain of 
the considerations which have served as a point of departure for this paper. 





the 1 
the f 
by tl 


f(k) 


whe! 
ming 
minc 


minc¢ 
hom 
A va 
met! 
coefi 
first. 
the | 
sum 
stag 
com 
of se 
root 


of a 


the 

of a 
orig 
line: 
In t 
dire 
root 
line: 
four 
kay, 
cosi 








ee ee — we ———? 


i => we LY vs ° —_— se muro fe 


moan FP § OF KY CF 


Analysis of a Complex of Statistical Variables 429 


the roots always equals the number n of the tests, as appears from 
the form of (18); hence the fraction of the total variance contributed 
by the ith component is k,/n. 

Developing (18) we have 


f(k) = (—1)*(k" — nk"! + Sok"? — Syk"-* + 
five: + S,) x 0, (29) 


where Sz is the sum of the two-rowed principal minors in the deter- 
minant w of the correlations, S; the sum of the three-rowed principal 
minors, and so on. 


4. ITERATIVE SOLUTION 


The explicit calculation of the determinant w and its principal 
minors, and the solution of the characteristic equation and of the 
homogeneous linear equations (17), would be a laborious computation. 
A vast saving of arithmetical effort is effected by the following iterative 
method, which yields simultaneously a root and the corresponding 
coefficients, the roots appearing in order of magnitude, the greatest 
first. This makes it possible to stop whenever it is evident that all 
the important principal components have been obtained. Since the 
sum of the roots is n, the fraction of the total accounted for at any 
stage is always in evidence. During the calculation of each principal 
component, an error at any stage is rectified in the next. The risk 
of serious numerical error is therefore negligible, especially if all the 
roots are calculated and their sum compared with n. 


If numbers ai, d2, . . . @, proportional to the direction cosines 
of any line through the origin be substituted in the equations 
a! = 1;;0; (@=1,2,---,n) (30) 


the quantities on the left will be proportional to the direction cosines 
of a new line through the origin into which we shall consider the 
original line to have moved. Under this transformation, the invariant 
lines will be those for which the quantities k exist such that a,’ = kaj. 
In this case, (30) reduces to (17). Thus, for each invariant line, the 
direction cosines are proportional to a solution of (17), while k is a 
root of the characteristic equation. It follows that the invariant 
lines are the principal axes. Hence, if numbers ai, . . . , d, can be 
found which, substituted in the right-hand members of (30), give 
ka,, ... , kan, these numbers are proportional to the direction 
cosines of one of the principal axes and to the coefficients of one of 


it i 
f 4 
v} 

h 

; : 


y 


"TE Ben<es 


Cee 





430 The Journal of Educational Psychology 


the principal components in the expressions for the test scores z,, 
while k is the sum of the contributions of this component to the 
variances of the test scores. 

If two or more roots of the characteristic equation are equal, the 
ellipsoids are figures of revolution, with two or more equal axes. Every 
line in the plane or hyperplane of the equal axes is invariant under the 
transformation; the axes themselves may be taken as arbitrary per- 
pendicular lines in this plane or hyperplane. 

If with respect to new coordinate axes coinciding with the principal 
axes of the ellipsoids the direction cosines of a line are proportional to 


bi, be, . . . , bn, the transformation 
b,’ = Kyb,, b,’ = kebe, oe ae b,,’ = RPa, (31) 
where ki, ke, . . . , kx are as before the roots of the characteristic 


equation, will geometrically be the same as (30), since the invariant 
lines are the principal axes, together perhaps with the lines in the plane 
or hyperplane determined by two or more equal axes. Algebraically, 
this amounts to putting. 


a,’ = aind,’, a; >= AjmOm. 
Substituting these expressions in (30) and in the result setting 
TQ; = kay (not summed for J), 


a relation which is the generalization of (16), we multiply by Ai», sum 
for 7, and use (4). The result is (31). 
Let the notation be so arranged that 


ki 2 he 2 +--+ 2 kp. 


If k, is greater than ke, and if b; ~ 0, we then have from (31) that each 
of the ratios be’/b,’, . . . , b,’/b:’ is numerically less than the corre- 
sponding ratios be/bi, ..., b»/b:. When the transformation is 
repeated, the absolute values of these ratios are further diminished, 
and with further repetitions approach zero in geometrical progressions. 
If, however, the q greatest roots are equal, the ratios among the first 
q direction cosines remain unchanged under the transformation, while 
the remaining p — q direction cosines approach zero. Thus if we 
start with any line which does not lie in the hyperplane of the p — 1 
axes perpendicular to the longest axis, this line will, under iteration 
of the transformation, approach the longest axis if the greatest root is 
unique, or, if there are several equal roots greater than the rest, some 
position which may be taken as the longest axis. 





Fr 
compe 
by in 
memk 
of (31 
accur 
quant 
of the 
impr¢ 
prince’ 
pone! 


are fe 
Vies/ 
A 


sougl 
onal 
the « 
pract 
place 
perpe 
The 
onali 
revis 
in th 
ance: 
The 
coeff 
furtl 
colw 
it is 
at al 
dete 
proc 


set « 
for 1 
chai 
valu 








31) 


tic 
unt 
ne 


ly, 


im 


ch 
re- 

is 
1S. 
st 
ile 
ve 


on 


ne 





Analysis of a Complex of Statistical Variables 431 


From this it is evident that the coefficients of the greatest principal 
component of the tests may be obtained with any required accuracy 
by inserting an arbitrary set of numbers a;, ... , a, in the right 
members of (30), multiplying or dividing the resulting numbers aj’, 

. , a, by any constant, again substituting in the right members 
of (30), and repeating the process until, to the required degree of 
accuracy, the quantities obtained are multiples of the preceding 
quantities by a constant k;. This constant will be the greatest root 
of the characteristic equation. The process fails only in the infinitely 
improbable case of the initial values being linearly dependent upon 
principal components other than the first; the greatest of these com- 
ponents is then approached. 





The coefficients a11, 21, . . . , @n1 Of the first principal component 
are found by multiplying each of the quantities a;, a2, .. . , an by 
Vks/ Za;’. 


After the coefficients of y; have been determined, those of y2 are 
sought. This might be done by starting with arbitrary values orthog- 
onal to the coefficients of 71, which would give the desired result if 
the calculations were carried out exactly. But on account of the 
practical necessity of working only to a limited number of decimal 
places, the transformed line will at each stage deviate from the plane 
perpendicular to the first axis, and will drift off toward this first axis. 
The tendency could be offset by applying corrections to restore orthog- 
onality each time, but this is excessively laborious. Instead, we 
revise the matrix of correlations by subtracting a;,a;, from the element 
in the ith row and jth column. This produces the matrix of covari- 
ances of the quantities z; — ai1yi1, which are uncorrelated with 71. 
The iterative process applied to the reduced matrix yields kz and the 
coefficients of 72. To obtain y; we apply the iterative process to the 
further reduced matrix in which the element in the ith row and jth 
column is rj; — @i14;1 — @i24;2; and soon. When this method is used, 
it is not even essential to take the initial trial values of the coefficients 
at any of the later stages precisely orthogonal to the coefficients already 
determined; round numbers may be chosen at the beginning, and the 
process will converge to the correct values anyhow. 

A convenient procedure is to divide each of the trial values of any 
set of coefficients by a fixed one of them. The next value obtained 
for this coefficient will then be an approximation to the appropriate 
characteristic number k. The process should be started with trial 
values of one digit each, the largest of these values corresponding to 





U 432 The Journal of Educational Psychology 


Ap variates which are on the whole most highly correlated with the rest, Th 
i as judged by inspection. Each digit should be accurately determined, 
by repetition until stationary values are reached, before the calcula- 


a tions are carried to another place. val 
im The labor is sometimes reduced if the deviations of a set of trial 
| i a’s from the preceding set are multiplied by the rows of the correlation mu 
ee matrix and the results added to the last trial a’s to get the next set, doe 
eh instead of multiplying the trial values themselves by the correlations. the 
a's However, this method does not automatically correct errors unless anc 
the values obtained finally are multiplied by the correlations and 
added. 


5. EXAMPLE 


Truman L. Kelly (op. cit., p. 100) gives the correlations found in a 
| sample of 140 seventh-grade children among numerous tests. We 
e select the correlations, corrected for attenuation, among: (1) Reading ma 
speed, (2) reading power, (3) arithmetic speed, (4) arithmetic power. 
Curtailed to three places, these are, in the natural order: 








) 1. .698  .264 .081 }- 
| 698 1. —.061 .092 
.264 —.061 1. .594 Sta 
081 .092 .594 1. prir 
The correlations being slightly higher on the whole for the first 
than for the later tests, we take as trial values 
1, 9, 8, 7. 
Multiplying by the rows of the matrix of correlations, we have, to 
two place, 1.90, 1.61, 1.42, 1.33. If we divide each of these values Sub 
by the first in order to make them comparable with the initial quanti- we | 
ties, we obtain as our second approximation: 
\ . 2 
Multiplying these by the correlations, we obtain 1.85, 1.57, 1.37, and 
1.30. Dividing these by 1.85 gives Thi 
i 
1, .85, .74, .70. mead 
This is close enough to warrant carrying the calculations to one more four 
decimal place. We next obtain 1.846, 1.567, 1.368, 1.304; and upon the 
division by 1.846, in t 


1, ee, .741, .707 com 


3 
BA i eh ee 


f 








ye 
1g 
Tr, 


st 


les 
ti- 


nd 


ore 
on 





Analysis of a Complex of Statistical Variables 433 


The next trial gives, after division by 1.846, 
1, .849, .743, .706, 
values which we adopt after noting that the differences, 
0 0 .002, —.001, 


multiplied by any row of the correlation matrix, produce a total which 
does not affect the third decimal place. We divide k; = 1.846 by 
the sum of the squares of the final trial values, extract the square root, 
and multiply by the trial values. This gives 


ai; = .816 ky = 1.846 


ae = .693 
a3 = .606 
ada = .576 


Subtracting the products a;,a;; from the elements of the correlation 
matrix we obtain 
.3380 §=©.129 — .233 — .392 
129 .517 — .484 —.310 


— .233 —.484 .631 .243 
— .392 —.310 .243 .666 


Starting from the trial values —.6, —1, 1, 1, we obtain for the second 
principal component: 


ai2 = — .438 ke = 1.465 








ae = — .620 
a32 = .674 
4a = .660 


Subtracting the products of these numbers from the reduced matrix 
we find: 


.138 —.142 .062 —.103 
—.142 .133 —.066 .099 
.062 —.066 .177 —.202 
—.103 .099 —.202 .230 








This time we take the trial values —1, 1, —1, 1.3, and obtain another 
root, another set of coefficients, and another reduced matrix. A 
fourth application of the iterative process completes the resolution of 
the tests into their principal components. The results are combined 
in the table below, in which each column corresponds to a principal 
component, and each of the last four rows to a test. The entries in 





: if | 434 


The Journal of Educational Psychology 


Lae the last four rows are the coefficients of the y’s in the expressions for 
the z’s; they are at the same time the correlations of the y’s with the z’s. 








he Reading speed... 


Arithmetic speed. 


Totals 
aie habs dit ta ee tack 1.846 1.465 .521 . 167 3.999 
kt Percentage of total variance.....; 46% 361% 13 4 100 
pea ed eee an .818 | —.438 | —.292 . 240 
Reading power................. .695 | — .620 .288 | —.229 
ish So ale Sisal .608 .674 | —.376 | —.193 
Arithmetic power.............. .578 . 660 .459 .143 




















The chief component seems to measure general ability; the second, 
a difference between arithmetical and verbal ability. These two 
account for eighty-three per cent of the variance. An additional 
thirteen per cent seems to be largely a matter of speed vs. deliberation. 
The remaining variance is trivial. 


6. SAMPLING ERRORS 


The exact distribution of the roots of the characteristic equation, 
or of their ratio, can be found at once when n = 2. Indeed, we have 
in this case, 


ki =1+Y7, 


and the distribution of r in samples of N from a normally correlated 
population is fully known. 

For the case of zero correlation in the population, the distribution 
of r reduces to 


ke =1-—rT; 


1 T[w(N — DI xt 








1 —r?) 2 dr. 
tea 
If we put 
a ee _ ki — ke 
a? peng so that r = | >. 
this is transformed into 
N-4 


2 TiK(N —1)] (4u) ?- 


WV TB4(N — 2)] ue + 1 





For n > 2, this same distribution may be used as a close approxima- 
tion for differentiating between any two roots. 

Thus, to determine whether k, is significantly greater than k: 
in the example worked out in the last section, we compute 


Sin 
tres 
mes 
tim: 
abo 
The 
cen’ 
duc 
don 
fact 
of t 
sam 
tha: 


is Sit 
root 
spo. 


tim: 
ing 
said 
cor? 
we 

dist 
of t 


spo! 
seer 
vali 
pro| 
mor 
The 








@ 


i a 


nd, 
wo 
nal 
on. 


on, 
uve 


ted 


ion 


ma- 


1 ke 








Analysis of a Complex of Statistical Variables 435 
_ 1.846 — 1.465 _ 


115 





"= 1.846 + 1.465 


Since the sample consists of one hundred forty individuals, we may 
treat this value of r as a sample from a normal distribution of zero 
mean and standard deviation 1/+/139 = .085. Since r is only 1.35 
times its standard error, the probability of a greater discrepancy is 
about .18, and the two roots cannot be called significantly different. 
The fact that a single component accounts for so much as 46% per 
cent of the variance of the four tests tends to support the idea intro- 
duced by Spearman that one general factor enters into all tests to a 
dominating extent; but this argument is considerably weakened by the 
fact that y2 contributes nearly as much variance as y;, the magnitudes 
of the two contributions being indeed not clearly distinguishable in a 
sample of this size. The contribution of y; is however definitely less 
than that of y2; for 


aus 1.465 — .521 
1.465 + .521 


is some 5.58 times the standard error .085. The third characteristic 
root exceeds the fourth even more definitely, the ratio of the corre- 
sponding value of r to its standard error being 6.1. 

Apart from the question of equality of any two roots, it may some- 
times be desired to find upper and lower limits such that, correspond- 
ing to any given degree of probability, the ratio of the roots may be 
said to lie between these limits. Such limits may be deduced from the 
corresponding ones for the correlation coefficient. To obtain these, 
we may utilize R. A. Fisher’s transformation to z = tanh! r, whose 
distribution is nearly normal with a variance, 1/(N — 3), independent 
of the population value.' 

As an example, let us find upper and lower fiduciary limits corre- 
sponding to the probability .05 for the ratio k2/k3;, which has just been 
seen to differ clearly from unity. From Fisher’s table, we find that the 
value r = .474 corresponds to z = .515. The table of the normal 
probability integral shows that a quantity deviates from its mean 
more than 1.96 times its standard deviation with a probability .05. 
The deviation 


= .474 





1.960, = 2% 117 


5 





1 Statistical Methods for Research Workers, Oliver and Boyd, Chap. VI. 


436 The Journal of Educational Psychology 


is therefore to be added to and subtracted from the sample value .515, 
giving .398 and .632 as the fiduciary limits for z. Again referring to 
the table of hyperbolic tangents, we find .378 and .559 as the corre- 
sponding values of r. Inserting each of these limits for r in the expres- 
sion defining u, we have 2.21 and 3.53 as the extreme values of the 
ratio which can plausibly be assumed, corresponding to the probability 
.05. 

This use of the sampling distribution of r is subject to two qualifica- 
tions. In the first place, it applies only in comparing two components 
which are definitely identified, otherwise than by their having a par- 
ticular order among the entire set of n components determined, such 
as being the two greatest, or the greatest and least. This situation 
is common to all problems in which a number of observations are made 
on a quantity, and two of these observations are examined for the 
significance of their difference from each other. If the two are selected 
because of their positions relatively to the others in the sample, they 
cannot be compared accurately by means of the standard error of the 
distribution as if they were not so selected. In many examples, how- 
ever, including the foregoing, this consideration does not materially 
affect the conclusions to be drawn. 

It must also be remembered that the directions as well as the magni- 
tudes of the principal components are subject to sampling errors, and 
that these errors in determination of the directions will interfere with 
the accuracy of the foregoing treatment of the k’s by. transformation 
into correlation coefficients. That the inaccuracy introduced in this 
way is very slight is suggested by the geometry. Suppose that a 
principal plane of an ellipsoid derived from a sample makes a small 
angle @ with the corresponding principal plane of the population ellip- 
soid. Then the section of the population ellipsoid made by the sample 
plane will be an ellipse whose principal axes bear to the corresponding 
principal axes of the population ellipsoid ratios differing from unity by 
quantities of order 6. Hence if the standard error of @, like most 
standard errors, is of order 1/+/N, those of the semiaxes, and conse- 
quently those of the k’s and of the r’s calculated from them, will be of 
order 1/N. This suggests that a suitable correction for this kind of 
error could be made in the foregoing example by changing the standard 
error used, namely .085, by something of the order of K49 = .007. 

Essentially the same results may be reached in another way. The 
k’s are, in the population, variances of independent variates. The 
ratio u of two independent estimates of variance, each based on N — 2 





des 
has 
at 
sal 
val 
be 
thi 


as 
the 
ex] 


for 
the 
the 
val 
stit 
crit 


of j 
of : 
the 
ma 
uni 


whi 
sup 
whi 
giv 


He: 


bas 








15, 
to 


eS- 
the 
ity 


ca- 
nts 
ar- 
uch 
lon 
ade 
the 
ted 
hey 
the 
Ow- 
ally 


rni- 
and 
vith 
tion 
this 
it a 
nall 
llip- 
aple 
jing 
y by 
nost 
nse- 
ye of 
d of 
lard 
007. 
The 
The 
—2 





Analysis of a Complex of Statistical Variables 437 


degrees of freedom, as for example in independent samples of N — 1, 
has exactly the distribution of u given above. This way of, looking 
at the matter has the further advantage of showing that, for very large 
samples, log k may be treated approximately as a normally distributed 
variate with variance 1/(2N — 4). The n values of log k may then 
be treated as independent samples from a normal distribution having 
this variance. 


7. MINIMUM NUMBER OF INDEPENDENT COMPONENTS. RELIABILITY 
COEFFICIENTS 


But of still greater importance is the fact that by treating the k’s 
as estimates of variance in n orthogonal directions we may compare 
them, not only with each other, but also with the variance to be 
expected on account of the inaccuracy of the tests as revealed by their 
self-correlations, or reliability coefficients. This is of major interest, 
for if some of the principal components found contribute so little to 
the variance that their reality is in doubt, it is possible to assume that 
the number of independent components is less than the number of 
variables measured. The following tests may therefore serve as sub- 
stitutes both for Spearman’s use of tetrads and for the more elaborate 
criteria developed by Kelley in ‘Crossroads in the Mind of Man.”’ 

Upon administering a test twice to the same individuals a measure 
of its accuracy may be obtained from the correlation of the two sets 
of scores. This correlation is known as the reliability coefficient; for 
the 7th test we shall denote it by r;. If the test score is thought of as 
made up of two parts, a true score, whose variance we shall take as 
unity, and a random error of variance o,”, it is easy to see that 


ee oon 

l + o;? 
whence o,? may be determined from the data. The random errors are 
supposed to be independent in the several tests. For the four tests 


which we have been using as an example the reliability coefficients 
given by Kelley are, in order 


9197,  .8942,  .9083,  .5639. 


Tr; 


Hence we find 
co,” = .0873 o2? = .1183 o;? = .1010 og = .7734. 


Since the covariance of the ith and jth tests is the same whether 
based on the true or the total scores, the correlations will differ in the 


eer 


/ 
4 
a 
ty 
+ 
bh 


f? 
& 
a 


te wees 


aes 
ecseha 


a 


fag Ty 

- sane 

* ae a ted gate 

“ - X s 
ee See 


438 The Journal of Educational Psychology 


twc cases, that based on the true score being larger in the ratio 
V/(1 + o,7)(1 + 4,7). If 1's; is the correlation between the observed 
scores, the correlation between true scores is estimated as 


rg = Vrs = VO + 0%) (1 + 0) (t * Jj; not summed for 
torj), (32) 


and is known as a correlation corrected for attenuation. The exact 
sampling distribution of this quantity has never been determined, but 
for sufficiently large samples it may be used with confidence, subject 
to the validity of the assumption that the errors of measurement in the 
several tests are uncorrelated with the test scores and with each other. 
Since we wish to deal with real quantities as far as possible, we have 
based our analysis into principal components upon these corrected 
coefficients r;;. 

If the number of independent components of the n true scores 
is less than n, the scatter diagram of the true scores will lie in a flat 
space of smaller dimensionality immersed in the n-dimensional space. 
The scatter diagram of the observed scores will however be n-dimen- 
sional in character, since the scatter diagram of the errors of measure- 
ment is n-dimensional. The scatter diagram corresponding to the 
correlations corrected for attenuation would in this case be of the 
smaller dimensionality if calculated from the whole population; but 
on account of the fluctuations of the errors of measurement this 
would not in general be true for samples. We are therefore interested 
in comparing the variances of the principal components we find, and 
particularly the least of these variances, with those to be expected 
on the basis of the reliability coefficients. 

Now by (21), we have upon omitting the subscript 7, 


_— QZ; + A222 + re + Anon (33) 


The variance of this expression which results from errors of measure- 
ment is 











y2 _— Qy703? + Ge’o9? + °° * + dn7on? 
¢ = ° 
k? 
On the hypothesis that y has no real existence, but arises purely 
from errors of measurement, its variance, which we have taken as 





unity, should differ from a” only on account of fluctuations in these 
errors. On this hypothesis the equation 
k=k 





ani 


pr 
tre 


of 
ful 
na’ 


Fis 
val 
in 

fre 


of 

est 
err 
err 
ab 
of 

the 


gai 
of 


qu 








‘lo 


ed 


‘or 
2) 


ict 
ut 
ct 
he 
er. 
we 


ed 


res 
lat 


en- 
re- 
the 
the 
but 
his 
ted 
und 
ted 


33) 


ire- 


rely 
1 as 


1ese 





Analysis of a Complex of Statistical Variables 439 


where 





k= ayo: + ay'o2? + - - - Fa,%0,?, 

should fail to be satisfied only in so far as k and k are affected by 
random sampling errors. These two quantities appear to be uncor- 
related with each other; for although the k’s are derived from cor- 
relation coefficients corrected for attenuation with the help of the 
data from which the k’s are deduced, still this process is analogous 
to that in Fisher’s analysis of variance of subtracting from the total 
variance the intraclass variance before comparing with the interclass 
variance, which is independent of this difference. If we rely upon this 
analogy, we may compare k with k simply by adding their variances, 
provided the samples are large enough to allow the difference to be 
treated as normally distributed. 

Instead of comparing k with k directly, we might use any function 
of k and the corresponding function of k. In choosing among such 


functions as k, k?, +~/k, and log k, it is to be recalled that k is of the 


nature of a variance, as pointed out at the end of the last section. 
In comparing estimates of variance, the logarithm is used by R. A. 
Fisher because its standard error is independent of the value of the 
variance in the population, and depends only on the number of cases 
in the sample—or, more generally, upon the number of degrees of 
freedom on which the estimate of variance is based. Against this 
advantage must be set the fact that the square root of an estimate 
of variance has a more nearly normal distribution than either the 
estimate of variance itself or its logarithm, and the use of the standard 
error presupposes a normal distribution. Further, log & has a standard 
error which may be found approximately from the definition of k 
above, and which lacks the property of depending only on the number 
of cases, the a’s and o’s being involved in it also. The advantage of 
the logarithm is thus lost when a comparison is made with k, but the 


gain in accuracy in the use of ~/k persists, though the higher moments 


of the various functions of k have not yet been investigated. Conse- 
quently we shall make our comparisons in terms of the square roots. 
Since 


Vk = (2a;%o 7), 
we have, apart from terms of higher order, the following relation 
between deviations of sample from population values: 


iV i = 16(2a;’o;”) ~% (Saja ;60;). 


a cae 


‘ 
fib 
idl 
ax 
4 
if 


a 


= 


ME Gat We F 


. 


Ay Gea 


» sont ae ay Aga ay ie 


piste St te 
capa 


—_ 
re 


a. 


~ 
3 hopes : 


<m 


ise A 


ae 





440 The Journal of Educational Psychology 


The mean value of this expression is zero, provided the estimate of 
o;? is without bias. This will be the case if c;* is calculated as the 
ratio of the sum of the squares of the differences between test 
and retest to 2N; this method of calculation appears to be approxi- 
mately equivalent to that from ‘reliability coefficients. Neglecting 


any bias, then, the variance of V& will be the mathematical expec- 


tation of the square of the above expression. Since the errors of 
measurement of the tests are independent of each other, 


E4(c,)8(c;) = 0, if i ¥ j, 


while the usual formula for the variance of the standard deviation 
gives 


E(80,)? = 
(60;)? = sy 
Making these substitutions after squaring 5V k, we obtain finally, 
o? cal Za;*o;* 
VE 8 NK 
To this we add in each case 
2 = Kk 
Vk 2N 


to obtain the variance of »/k — +/j. The results for the four tests 


based on one hundred forty cases which we have been considering 
are given in the following table. The variances found for ~/k were 


all considerably higher than those for +/f, the ratios to the latter 
ranging from 2)% to sixty-seven. 








Principal — = Ratio of difference 
component . Vi VEO to standard error 
1 1.846 1.359 .801 . 6.72 
2 1.465 1.210 .814 5.28 
3 . 621 .722 .665 1.31 
4 . 167 .406 .413 — .28 

















The value for the fourth component is actually less than the 
value to be expected on the basis of errors of measurement, while 
that for the third component does not significantly exceed the expected 
value. For the other two, however, the excess is decidedly significant. 








> of 
the 
est 
)xi- 
ing 
eC- 
| of 


ion 


sts 
ing 
ere 
ster 


ice 
or 


the 
iile 
ted 





Analysis of a Complex of Statistical Variables 441 


We conclude that the true scores, if we could find them, would display 
a scatter diagram of at least two dimensions, i.e. that there are at 
least two genuine independent components; but this experimental 
material supplies no evidence for more than two independent com- 
ponents. Thus we can definitely affirm the existence of two such 
components, though we cannot distinguish definitely between them 
with one hundred forty individuals. In the scatter diagram, we 
can fix with some definiteness the plane of these two leading com- 
ponents, but on account of their nearly equal contributions to the 
variance we cannot be at all sure of their directions within the plane. 
The ellipses of the scatter diagram are too nearly circular for this. 
It is possible, but far from certain on this evidence, that they are 
really three-dimensional, close in form to oblate spheroids. 


(T'o be concluded in October issue.) 


iy 


Ae 

y 

tM 
“a 
% 


wore: 
. 


- 
ne 


32> : emets ~ 
> Sm 
SS gees 


ye 


COMPARABLE TESTS AND RELIABILITY 


JACK W. DUNLAP 


Fordham University 


The quantitative treatment of data for theoretical or practical 
purposes requires a knowledge of the reliability and validity of the 
measures used. Bound up closely with these two problems is that of 
the objectivity of the measures. The validity of a measure is affected 
by its reliability, and its reliability is in turn affected by the subjectiv- 
ity of the measures. The problem of validity will not be dealt with 
here. Our problem is to devise a criterion for comparable tests and 
to examine the various coefficients proposed as measures of reliability 
for a test. 

The reliability of a test has been defined by Ruch and Stoddard! 
as ‘‘the accuracy with which the test measures what it does measure.”’ 
Kelley? attaches a similar meaning to the reliability coefficient, although 
he injects into his argument certain concepts which the writer believes 
to belong essentially under validity. Kelley’s views on the subject are 
illustrated by the following quotations: 


‘Let me hasten to state that in a given computation test there may be problems 
which are possessed of features not found in any other problems in the same test, 
which features may still be related to the general field of computation. In so far 
as this is true the test will tend to have a reliability for the examiner’s purposes 
which is greater than the r;; given by the Spearman-Brown Formula.” 

“Tf an exercise contains some feature which is absolutely unique, the fact 
that we do not measure it in obtaining a reliability coefficient is not only no 
drawback but a decided asset to one who uses the reliability coefficient in esti- 
mating the accuracy of a prediction based on the score on the exercise. 2. If an 
exercise contains a feature which is unique so far as the other exercises in the test 
are concerned, but not related to the general field of the subject, then, (a) the 
Spearman-Brown r;; will tend on the account to be too small, but (b) the feature 
not being totally unique it should be possible (though at times difficult) to draw 
up a second exercise measuring the same feature, thus the devising of a second 
similar form of the test is a possibility. 3. If two or more exercises contain 
common features, not found in the general field, then the Spearman-Brown ri 
will tend on this account to be too large.” 





1 Ruch, G. M. and G. D. Stoddard: ‘‘Tests and Measurements in High School 
Instruction.”” World Book Co., 1927, pp. 51. 
2 Kelley, T. L.: Note on the Reliability of a Test. A Reply to Dr. Crum’s 
Criticism. Jour. Educ. Psychol., Vol. XIV, 1924, pp. 193-204. 
442 





the 
sive 
the 


ter! 
hal 


tod 
Ke 


squ 
an 


phi 
hac 
fro! 
hov 


Rel 


ant 


in) 
anc 
Zi 


Psy 


Tes 


Bri 








he 
of 
ad 


th 
id 


ty 


d} 

9) 
rh 
es 


ms 
st, 
far 
ses 


act 
no 
sti- 


est 
the 
ire 
aw 
nd 
ain 
Ti 


ool 


ns 





Comparable Tests and Reliability 443 


Later, Kelley* gives this definition: ‘The reliability coefficient is 
the correlation of the scores of the same individuals upon two succes- 
sive similar tests.’”’ The implication is that the reliability of a test is 
the consistency with which it measures what it does measure. 

This is a special case of Spearman’s‘ original definition that the 
term ‘‘reliability coefficient’’ is defined as ‘‘ the coefficient between one 
half and the other half of several measurements of the same thing.” 

These concepts of reliability seem to be the most generally accepted 
today, but there is another measure which has been proposed by 
Kelley.’ This is the ‘‘index of reliability” given by Kelley as the 
square root of the reliability coefficient. He states that it ‘‘is probably 
an even more significant index of reliability” than the correlation 
coefficient itself. It is from this statement that Monroe® secures the 
phrase ‘“‘Index of Reliability” ascribing it to Kelley. This function 
had, however, been developed earlier by Abelson’ as a special case 
from Professor Spearman’s paper on reliability. Abelson does not, 
however, state an algebraic formula for the function. 

A concise demonstration of the algebra underlying the Index of 
Reliability may be given as follows: 


Z, = deviation score on one form of a test 
Z2 = deviation score on a second form of the test 
2. = true deviation score. 


and 
1 = CiZa + C1 
Le = Cole + C2 


in which e; and e2 are chance errors and uncorrelated with each other 
and with z.., and c; and c2 are constants relating to the units in which 
z, and x2 are measured. The reliability coefficient is 


224122 
Ts = Nous: (1) 


* Kelley, T. L.: “Interpretation of Educational Measurements.”’ World 
Book Co., 1927, pp. 38. 

‘Spearman, C.: Correlation Calculated from Faulty Data. British Jour. of 
Psychol., Vol. III, 1910, pp. 281. 

* Kelley, T. L.: A Simplified Method of Using Scaled Data for Purposes of 
Testing. School and Society, Vol. IV, 1916, pp. 34-71. 

* Monroe, W. S.: “Introduction to the Theory of Educational Measurements.” 
Houghton Mifflin Co., 1923, pp. 206. 

7 Abelson, A. R.: The Measurement of Mental Ability of Backward Children, 
British Jour. of Psychol., Vol. 1V, 1911, pp. 268-314. 











444 The Journal of Educational Psychology 
But. 
221%2 = CiC2 2X wp? (2) 
Assume that 
C1 on 
C01 afi C2 
then 
C1200" 
ws = o. 
and 
(ry) = C10 % (3) 
01 


The desired index of reliability is 
ZLol1 _ ULa(Cite + 61) 





“™ Nowo1 No.1 
CiV x 
ay 4 
From equations (3) and (4) it follows that 
Tel ™. V/ri2 (5) 


It should be noted that the assumption that the ratios c:/o, and 
C2/o2 are equal is simply the assumption that the two tests, 7; and z; 
are equally reliable. 

Cureton® following Kelley? gives the definition: ‘“‘The reliability 
of a test is the ratio of the true variance to the obtained variance.” 
We may consider an individual’s score on a test as composed of two 
portions, his true score and an error of measurement. This “true” 
score may be a measure of the combined effect of several functions. 
If we assume that this true score and the error of measurement can 
be represented approximately by a linear equation then transferring the 
origins to the mean, we have as before 


m=cate 


where c is a constant descriptive of the units of measurement in which 
a is measured andeistheerror. The reliability coefficient is, c,?0.?/¢;’. 
Let us now examine the methods used for determining reliability 
coefficients. 

The reliability coefficient is usually determined by one of the 
following methods. If two forms of a test are available the correlation 





® Cureton, E. E.: Error of Measurement and Correlation. Archives of Psychol- 
ogy, No. 125, 1931. 





of t 
the 
ciel 
equ 


the 
for 
ity 
reli 
the 
val 
of | 


to | 
pre 


fro 
ma 
pai 
thi 


tior 


anc 
is € 








1% 
7 


, 
ia 
ia 
iY 
1 Ws 


2 
ee 


Comparable Tests and Reliability 445 


of the scores of the same individual on two forms of the test is taken as 
the coefficient of reliability. When we take the intercorrelation coeffi- 
cient as the reliability of either form we assume that the two forms are 
equally reliable, provided we understand by reliability the definition 
proposed by Cureton. More important here, is the assumption that 
the tests are equally valid and that they are comparable. If the two 
forms of our test are comparable, 1t.e., possess the same degree of valid- 
ity and relative variability, we may use their intercorrelation as the 
reliability coefficient. Note that such a coefficient is a function of (1) 
3) the internal consistency of the test, (2) the day to day (quotidian) 
variability of the individual, and (3) the length of the test (adequacy 
of sampling of the ability). 
The term “quotidian variability” has been used by Woodrow® 
to designate the day to day variability in the individual. In a given 
(4) problem we may desire such a coefficient, but in other cases we desire 
a measure of the consistency and comprehensiveness of the test free 
from the quotidian variability.* 
5) When only a single form of a test is available either of two things 
may bedone. We may repeat the application of the test, correlate the 


2) 








nd paired scores and get a “‘retest”’ coefficient. Paterson et al.,'° propose 

ad this as the most satisfactory measure of reliability. 

ity 1 That is when ¢:/o0: = ¢:/o2, the relative variabilities are by definition (assump- ‘ 
” tion) equal. In this case, 
"5 Ci*¢q* = C2" q* 

wo <— 22 

e” 


and the two forms are equally reliable. We assume the same a in each case (that 
ns. is equal validity). 

an * Woodrow, H.: Quotidian Variability. Psychological Review, Vol. XXXIX, 
he 1932, pp. 245-256. 

* We shall digress here to point out a current misapplication of this method 

for determining reliability. The reliability of a rating scale, of, for example, a 

character trait is secured in one of two ways. First, the same judge is asked to 

wy rate a group twice and the correlation between the two successive ratings deter- 

ic mined. This is not the reliability of the scale, but the consistency of the judge. 


ry". The second method is to have two judges rate the same group of individuals and 


ity determine the correlation between their ratings. This is not the consistency 
of a judge’s rating, but is a type of validity coefficient. Such a coefficient might 
he be termed a ‘‘coefficient of commonality of experience.” A trait rated thesame 


by all judges would be a valid estimate provided our judges were competent;to 
estimate that trait in the individual. 

hol- 10 Paterson, D. G., R. M. Elliot, L. D. Anderson, H. A. Toops, and E. Heid- 
breder: Minnesota Mechanical Ability Tests. Univ. of Minnesota, 1930, pp. 
26-28. 


ion 








446 The Journal of Educational Psychology 


They state: 


“The most satisfactory way of measuring reliability is to retest a group of 
individuals with exactly the same test formerly used, after a period of time suffi- 
ciently long to counteract effect from practice, and before there could be any 
considerable change in the subjects with respect to the trait tested . . . The 
correlation between the scores in the original tests and the retests may be used 
as a measure of the degree of correspondence between the two.” 


The use of the retest coefficient gives a value which has been spuri- 
ously affected by the correlation of errors. It is, in fact, the quotidian 
reliability. For example an individual guesses today as to the answer 
of a particular question. Tomorrow when presented with the same 
question he tends to give the response made previously rather than 
make a new and independent judgment. Further, even if he forgets 
the actual responses, there is still the error in the correlation due to the 
fact that the same item-sample is presented twice. The use of the 
retest coefficient assumes that we are interested primarily in the stabil- 
ity of the subject’s responses, that is his freedom from quotidian vari- 
ability. This is an entirely different problem from the determination 
of the internal consistency or reliability of the test. 

The other technique, when only a single form of a test is available 
is to split the test in halves, correlate the scores on the halves and then 
estimate from this value by means of the Spearman-Brown‘:'! formula 
the reliability of the total test. In taking split halves of a test, we 
choose alternate items. Essentially we have two forms of the same 
test each half as long as the original. Due to the order in which the 
items are taken, the situation error is approximately the same for both 
halves, and the quotidian variability has been entirely eliminated. 
Other forms, however, of personal variability may remain. The 
correlation between the two halves assumes that the two halves are 
equally reliable. This is precisely the same assumption made above 
when dealing with two forms of a test. Theoretically it is possible 
to have two forms of a test where the reliability of each form deter- 
mined by the Spearman-Brown formula would be unity, yet the relia- 
bility determined by correlating the scores on the two forms is less 
than one. This condition is met when the tests are perfect samples 
and the subjects show quotidian variability. 

It is assumed when we have two forms of a test that they are com- 
parable. One of the points made for splitting a test into halves by 





11 Brown, W.: Some Experimental Results in the Correlation of Mental 
Abilities. British Jour. of Psychol., Vol. III, 1910, pp. 296-322. 





the 
que: 
lays 


thus 
to a 
of th 


by Ff 
mon 


min 
In | 
bef 
mes 
ime 


to | 
det 
of « 
poi 
for 
san 


in 1 
bra 
int 
for 
the 


the 
tet 


|= 


diff 


nec 
hay 


Br; 








ffi- 
ny 
he 
ed 





Comparable Tests and Reliability 447 


the odd-even item method is that we have comparable forms. The 
question arises, ‘‘When do we have comparable forms?” Kelley’? 
lays down the following rules for the construction of comparable tests. 


‘“‘(1) sufficient fore-exercise should be provided to establish an attitude or set, 
thus lessening the likelihood of the second test being different from the first, due 
to a new level of familiarity with the mechanical features, etc. (2) the elements 
of the first test should be as similar in difficulty and type to those in the second, pair 
by pair as possible, but (3) should not be so identical in word or form as to com- 
monly lead to a memory transfer or correlation of errors.” 


The similarity of the test or parts of a test as to content is deter 
mined by an examination and rests on the judgment of the examiner. 
In addition to these points certain statistical criteria should be met 
before tests are designated as comparable. Ideally, all the directed 
mean tetrads of item-pairs should equal zero. This point is of great 
importance to workers who desire to construct a test which measures 
Band nothing but B. It seems to the writer a more logical procedure 
to build up empirically tests which measure a single factor and then 
determine their validity, than to try to isolate the various factors 
of our present heterogeneous tests. More easily determined, is the 
point that the tetrads resulting from the intercorrelations from four 
forms (or four fourths of a single form) should all equal zero within the 
sampling error.* 

Four forms are necessary to secure the six intercorrelations used 
in the tetrads. If only a single form of a testis available it may be 
broken into fourths. If two forms are available, each may be split 
into halves, thus giving the necessary four tests. In case four or more 
forms are available, all the intercorrelations should be calculated, and 
the resulting tetrads should all be shown to equal zero. 

Certain important points can be shown by using various ones of 
these groups. First, however, let us examine the proof for using the 
tetrad as a criterion of internal consistency. 

Spearman and Hart!* have shown that when four variables may 
be thought of as due to one general factor plus four specific factors, 


12 Kelley, T. L.: ‘Statistical Method.” Macmillan Co., 1924, pp. 203. 

* Dr. E. E. Cureton has pointed out that the means and sigmas may be entirely 
different, if the difference is due to different units of measurement. All that is 
necessary for mathematical comparability is that ¢\/o1 = ¢:/o: Thus the 
Thorndike Handwriting Scale and the Ayres Writing Scale are comparable, but 
have different units of measurement. 

3 Hart B. and C. Spearman: General Ability; Its Existence and Nature. 
Brit. Jour. of Psychology, Vol. V, pp. 51-84, 1912. 





yi 


Pig 
in 
} 
th 


. 
€ 
a 


448 The Journal of Educational Psychology 


the resulting tetrads will equal zero. The proof runs as follows. As 
before, let 
= ca+ e 


Le = Cod + 2 
Z3 = C30 + €3 
Le = C1A + C4 (6) 


Cross multiplying, summing, dividing by N and noting that the e’s 
are uncorrelated with each other and with a, and defining a; as c\¢,4/c,, 
Q@2 @S C2t_/o2, etc. (Kelley'* gives a thorough and clearly presented 
proof of this point) we have 


Ti2 = Q1a2 Te3 = Aca; 
Tis = 1a T24 = 2a, (7) 
Tig = 110% T34 = 304 


and 
tiesa = Ti12%s4 — TisTo4 = O 
tiess = Tulsa — TruTe3 = O (8) 


tiaos = TisT23 — TisTo, = O 


Now this is the situation posited for comparable tests: Namely 
that each measures the same thing and to the same degree. Whatever 
test 1 measures, a may or may not be an element incapable of further 
analysis, but it should be the same a that the other tests measure. 
If this is what is desired in comparable tests, then Spearman’s tetrad 
equations give the needed mathematical criteria. 

If we have two forms of a test each broken into two halves, it 
seems almost certain that there will be specific correlation between 
the two halves of each form due to the constant response or situation 
error. If the test reliability, independent of the individual’s variabil- 
ity is desired, the Spearman-Brown formula should be used. The 
four fourths of the single form should meet the criterion that ti23,, 
ti243, aNd t3423 should all equal zero. 

When, however, the reliability of the test-in-use, including the 
individual variability, is desired, four distinct and separate forms must 
meet the criterion. When the tetrads for the four halves of two forms, 
each form given at a different time, equal zero, the evidence is that the 
individual variability is negligible and that the Spearman-Brown 
formula will give exactly the same value as the correlation between the 
two forms. 





4 Kelley, T. L.: ‘‘Crossroads in the Mind of Man.” Stanford University Press, 
1928, pp. 238 + vii. 





crit 
assi 


par 
bili 
exc 
use 


var 
cep 
not 
sevi 
is tl 
is Z 


and 


obse 
ther 


Her 
valu 
tor 


an 
corr 
mea 
equ: 


item 
sider 








6) 


Ee 8 
Ti, 


ed 


7) 


sly 
er 
er 
re. 


ad 


it 
en 
on 
vil- 
he 


B34, 


ist 
ns, 
he 
wn 


the 





Comparable Tests and Reliability 449 


If the four fourths, or the four forms of a test satisfy the tetrad 
criterion, we may proceed with assurance that the form of the equations 
assigned is satisfactory.* 

The tetrad test shows the equality of validity, that is the com- 
parability of content of the forms, but does not show equality of relia- 
bility. If tests have equal reliability, equal variability is superfluous, 
except for the special case of several forms in which it is desirable to 
use the intra-class r. 

The definition that the reliability of a test is the ratio of the true 
variance to the obtained variance is more general than the above con- 
cept of reliability. Cureton* has pointed out that this definition does 
not assume that the errors of measurement sum to zero, or that the 
several forms be equally variable or reliable. The only assumption 
is that the correlation between the trait and the errors of measurement 
is zero. 

Let 2; and 22 refer to the two halves of form m. 


Z1 = c)a + €; 


Le = Coa + ee (9) 
and we may write 
01? = C;"0,? + 07, (10) 
Oo? = C2047 + a2 
3 = ae (11) 
0102 


We have defined the reliability as the ratio of the true variance to the 
observed variance, and if we assume as above that c;/o; equals c2/cs, 
then 
pra C1204” 
1 O12 

Here the reliability of half the test is defined in terms of itself. The 
value o,? can be determined from the data directly, but it is necessary 
to resort to other methods to evaluate c;o,?. 

Theoretically, it should be possible to determine the reliability of 
an observed measure, whenever the tetrads formed from the inter- 
correlations of the measure considered and any other three comparable 


measures equal zero, following the method described below. From 
equation (11) we have 





(12) 





* Actually the tetrads obtained by correlating every item with every other 
item should all be zero within their sampling errors before there can be any con- 
siderable assurance that the obtained reliability is the true reliability. 


‘ian 
, a 
Hy ' 

ty 
k 
k 
Lad 


450 The Journal of Educational Psychology 








TiaTis _ (a1a20103) = a; (13) 
T23 23 


and combining equations (12) and (13) we have 


Ti2T 13 TiaT 13 
Ti = = —— 
T23 134 





(14) 


(reliability of a single form 1). 

Cureton® proposed this formula as the measure of reliability of 
form 1, but did not point out that it is not strictly applicable until it 
has been demonstrated that the tetrad criterion is satisfied by the 
forms used. This formula is of great importance when three or more 
forms of the test are available. Now 


rr) = a? and T, = a” and Ti2 = Q@1Q2 
so that the intercorrelations of two forms of a test (1 and 2), 


Ti: = (ryr2)?* (15) 


If we take the correlation between two halves of a test as the 
reliability for each half, we are using a geometric mean which under- 
estimates one and overestimates the other. It should be noted that 
the larger of the coefficients will be underestimated more than the 
smaller coefficient will be overestimated. With reasonable precau- 
tions in selecting the two halves, this effect should be negligible. When 
r, equals re, then neither will be underestimated, but this will only hold 
when c,/o, equals c2/e2; that is, when the two tests are equally reliable. 

Let us examine the Spearman-Brown formula to see just what we 
have when the reliability is determined by this method. The Spear- 
man-Brown method gives us slightly higher coefficients of correlation 
than does the correlation-between-two-forms method, due to the fact 
that it does not include the quotidian variability, but does include the 
situation error. A working approximation of the test-reliability is 
secured by this method. 

Let m represent the form employed and 1 and 2 the odd and even 
halves. Now 


(c, + ¢2)*o4? 
07142) 


(c: + C2)%o4? (16) 





Tn = T1492) = 





~ (oy? + a2? + 2ry20102) 
If we assume that the units of measurement (c, and cz) and the standard 
deviations are equal, formula (16) reduces to formula (17), the Spear- 





man- 
assul 
tion- 
of tl 
form 
them 


Whe 
betw 


latio 
and 
the | 
are € 
that 
as b 


halv 
(1 + 
and 

(c, - 


and 


Sub: 
(16) 


Thi: 
we | 








y of 
il it 

the 
10re 


(16) 


dard 
pear- 





Comparable Tests and Reliability 451 


man-Brown formula. Note that the condition of equal variability is 
assumed by the Spearman-Brown formula, though not by the correla- 
tion-between-forms formula. If two tests are equally reliable measures 
of the same thing, but unequally variable, the Spearman-Brown 
formula breaks down, since the only way in which this can occur is for 
them to be measured in different units. 


2a 102 = 2r ie 


hems (1+ aya2) (1+ 712) 


When two forms of a test (m and n) are available, the correlation 
between the two may be determined. Now in terms of the halves 


(¢; + C2) (C3 + C4)o" 


F (14+2)%( 344) 


The right side of formula (18) is a reliability formula, not a corre- 
lation formula. Formula (18) is strictly true only if the forms m 
and n are equally reliable. To reduce formula (18) to formula (17), 
the Spearman-Brown formula, it is necessary to assume that the c’s 
are equal, that the o’s are equal, and that riz equals 3,4. It is evident 
that more assumptions are necessary than above. It is also assumed 
as before that the errors of measurement are uncorrelated. 

Formula (18) can be evaluated in terms of the variances when the 
halves of the two forms meet the tetrad criterion. The variance of 
(1 + 2), form m, can be obtained directly from the standard deviations 
and intercorrelations of the halves. It is now necessary to evaluate 
(cy + ¢2)’o,?. From formulas (12) and (14) we have 





(17) 


Tmn = T(14+2)(34+4) = 





(18) 








4g 
Ci0g = o(“) (19) 
and 
M4 
woof ” 


Substituting the values from equations (19) and (20) in equation 
(16) and expanding gives the reliability of form m. 


jon“) + oxt( 2) + 2avesrs | 
T23 Ti3 











Tn = 


(21) 


[o;? + 027+ 201027 12] 


This measure, formula (21), is the one that should be used whenever 
we have a second test that can be used for reference. The reliability 


452 The Journal of Educational Psychology 


of form m has been expressed as the ratio of the true variance to the 
observed variance. 

If we accept as the definition of reliability, the ratio of the “true” 
variance to the observed variance, then the Spearman-Brown formula 
offers a better estimate of the reliability of a test than the inter- 
correlation with another form. This statement rests upon the mean- 
ing of “‘true”’ variance, and holds if we mean by “‘true”’ variance the 
“true instantaneous variance,” that is, the variance at the time of 
testing. If by “true’’ variance one means the “true average variance,” 
then the best estimate of reliability is obtained from the intercorre- 
lation of two or more forms given on different days. 

The Spearman-Brown formula thus gives the reliability of the 
test relatively independent of the reliability of the subjects. It 
should be noted that the true score of the Spearman-Brown formula 
is the “‘true”’ ability at the instant, while the ‘‘true” score of the 
intercorrelation of two forms is the true underlying ability or average 
ability of the subject. This last has, perhaps, more meaning psycho- 
logically and in pedagogical practice. When it can be demonstrated 
that four forms or four-fourths are comparable and valid alternatives, 
the reliability should be determined by formula (14). 

Satisfying the tetrad criterion indicates that the correlation has 
not been materially affected by the correlated situation errors, and 
that the effects of quotidian variability are negligible. 


SUMMARY 


1. The reliability of a single form of a test is the ratio of the 
“‘true”’ variance to the obtained variance. 

2. The ‘‘true” variance means the variation in all factors that 
affect the scores, other than the errors of measurement of the test. 
An individual’s “‘true’”’ score does not mean his underlying ability 
in the trait measured, free from day-to-day variability and hour-to- 
hour fluctuations, but his ability at the time he is tested, including 
all the mental and physical influences that affect his mental efficiency 
and test performance at that moment. 

3. The use of the correlation between two forms of a test as the 
measure of reliability of one of them involves more assumptions than 
the Spearman-Brown formula for determination of the reliability of 
a single form, if true variance and reliability be defined as above. 
Particularly it assumes that quotidian variability is absent, in addition 
to the Spearman-Brown assumptions. 





not 
we 

obs 
for! 
ma’ 


coe 
inte 








1n- 
the 
of 


”) 


Te- 


the 

It 
ula 
the 
ge 
ho- 
ted 


has 
ind 


the 


hat 
2st. 
ity 
to- 
ing 
icy 


the 
1an 
- of 
ve. 
ion 





Comparable Tests and Reliability 453 


4. The tetrad technique offers a means of determining whether or 
not the split fourths (or four forms) of a test measure the same thing. 

5. When the four forms of the test satisfy the tetrad criterion 
we can determine the reliability of a particular form in terms of its 
observed and true variances, from itself and any two of the other 
forms, (see formula 14). 

6. The Spearman-Brown formula will give a very close approxi- 
mation to the reliability of the total form, as split halves will in general 
be approximately equally reliable. 

7. The numerical calculation of the various values for a reliability 
coefficient is simple, requiring only the standard deviations and 
intercorrelations. 


PEt Ee nS SATE 


=S te 
ea “ 


~ 





SOME TONAL DETERMINANTS OF MELODIC 
MEMORY 


OTTO ORTMANN 
Research Department, Peabody Conservatory of Music 


The training of the ear in melodic memory forms an essential part 
of the music training of children. The following test was made, as a 
preliminary survey, in order to learn what causes the difficulties in 
melodic memory which are commonly met with in classroom situations. 

A series of short melodic phrases was given to a group of classes, 
relatively unselected as to age, intelligence, and amount of training. 
All were students of music, familiar with the procedure of this test. 
The examples were played on the piano, once each, and the pupils, 
immediately after each example, wrote on printed forms what they 
heard. It is thus a test in immediate recall. The phrases were played 
at a uniformly slow tempo and the first tone was given in order to 
eliminate the need for so-called ‘‘absolute pitch.” 

Five-toned examples were used because previous experimentation 
had proved this number of tones to be sufficiently short to permit some 
perfect answers and sufficiently long to bring out individual differences 
and especially the determinants which were sought. The use of short 
melodic phrases with uniformly long notes is advisable in order to 
reduce to a minimum harmonic and rhythmic complications. Many 
studies on the psychology of melody have assigned to melody, attri- 
butes that are really harmonic or rhythmic contributions. It is for 
this reason that the results obtained in this test cannot be applied in 
toto to full phrases as we find them in musical literature. But whatever 
modification will be necessary is the result of harmonic and rhythmic 
determinants, not melodic. 

Figure 4 gives the complete series with the error distributions for 
each tone, each dot representing one error. A study of these reveals 
the operation of certain basic factors which I have called determinants. 
(For a detailed treatment of the fundamental attributes of melody, 
the reader is referred to the author’s study: On the Melodic Relativity 
of Tones. Psychological Monographs, Vol. XX XV, No. 1.) 


I. REPETITION 


The pitch series, upon which all melody is necessarily based, may be 
considered a one-dimensional series, motion in which, for our purposes 
454 





here, 
meal 
aspet 
beco: 
repet 
melo 


of ¢ 
1,6 
this 
per 

dire 
errc 








~ 


_ 


or 
ils 
Ss. 
y; 
ty 


es 





Tonal Determinants of Melodic Memory 455 


here, may be described as up ordown. Repetition of a tone, therefore, 
means zero motion. Since pitch-motion is the most fundamental 
aspect of melody, the discrimination between motion and non-motion 
becomes the most fundamental discrimination. And, in turn, tone- 
repetition should become the most easily recognized attribute of 
melody. 


Ex lé 


















4———_-# ___ - —_4J_ __ — 


. ele . ete ele «le ele ele sje 
Pee en ee oe ee = 


_ ei) 
+ oot, — 








| 



































Fig. 1. 


We can test for this by using melodic phrases containing an element 
of direct repetition: Exs. 2, 8, 12, 14; and, in modified form, Exs. 
1,6, 10, 15, 16, 19. In the examples containing immediate repetition, 
this element was missed only six times in 724, an error of less than one 
per cent frequency. Conversely, in 2064 examples not containing 
direct repetition, this was introduced by pupils only four times, an 
error of approximately one-fifth of one per cent. 








: 


woe 


SR a era a BAY 


456 The Journal of Educational Psychology 


Further proof of the fact that we are here dealing with the most 
fundamental aspect of melody, is given by the frequency with which the 
element of repetition is retained in misplaced position, at times with a 
radical alteration of melodic outline or pitch-contour. 

Figure 1 illustrates some variants found for the given examples. 
In each case a repeated tone is present in spite of errors in pitch-range 
of a seventh. In one version of Ex. 12, the entire melody is inverted 
and the intervals changed, yet the element of repetition is retained. 
Even the presence of an intermediatet one (Ex. 10) does not elimi- 
nate the recognition and memory of repetition. Variants similar to 
those given for Exs. 10 and 16, Fig. 1, were found equally frequently 
for all other examples containing interrupted repetition: Exs. 1, 6, 
15, 19 (see Fig. 4). 

The frequency with which repetition was retained in the most 
diverse tonal environments, and its absolute frequency when compared 
to the frequencies of errors yet to be considered, make tone-repetition 
or pitch-repetition the first determinant of melodic memory. 


II. PITCH-DIRECTION 


Whereas, in tone-repetition, the choice was simply between same- 
ness and difference, or motion and non-motion, pitch-direction, the 
next determinant, forces a choice between ascent and descent, or a 
choice of direction of motion. 

If we examine the examples containing only one pitch-direction: 
Fig. 4, Exs. 3, 5, 7, 11, 13, 17, we find that a change in direction was 
added only twenty-two times in 691 examples, an error of three per cent 
frequency. No instance was found in which an entirely ascending 
series or an entirely descending series was reversed. This would not 
be the case if pitch-direction were not a fundamental attribute of 
melody. 

If we count only the first interval, pitch-direction was reversed in 
only four cases out of 2580, although many errors in the size of the 
interval occurred. 

The most interesting proof, however, of the fundamentality of 
pitch-direction is given by the frequency with which variations from 
the model occur, all of which retain the proper ascent-descent relation- 
ships. Some of these have already been givenin Fig.1. A few others 
are given in Fig. 2. Here, in spite of pitch-errors ranging from a 
second to a sixth, the pitch-contour remains correct: Ascent remains 
ascent, descent remains descent. 


ple, 
mal 
In 

clas 
tait 
cha 
the 
sev 
bet 


thr 
twe 
int 








ost 
the 
ha 


les. 
ge 
ted 
ed. 
mi- 

to 
tly 


ost 
red 
ion 


ne- 
the 
ra 


on: 
vas 
ent 
ing 
not 

of 


| in 
the 


of 
om 
on- 
ers 
1a 





Tonal Determinants of Melodic Memory 457 


As we increase the number of changes in pitch-direction in an exam- 
ple, we break this unity, complicate the pitch-outline, and therefore 
make the recognition and retention of each change more difficult. 
In examples containing one pitch-change this was missed in a given 
class-group in one and seven-tenth per cent cases; in examples con- 
taining two changes, twenty-two per cent; in examples containing three 
changes, thirty-two per cent. For all classes tested and all examples, 
the average retention of all phases of pitch-contour was approximately 
seventy-five per cent. The increase in difficulty is much greater 
between one and two changes in pitch-direction than between two and 








~- > 


Fic. 2. 


three changes. That is to say, a direction division of a phrase into 
two parts offers little or no difficulty; a division into three parts 
introduces considerable difficulty, even in a short phrase of five tones. 

The extent to which the factor of symmetry affects recognition and 
retention of changes in pitch-direction cannot be determined from the 
data here used, since we cannot eliminate the element of repetition. 
The symmetrical examples—those in which the descending part is 
symmetrical to the ascending—Exs. 1, 6, 10, 15, 16, 19, all necessarily 
contain repetition. Speaking generally and very guardedly, however, 
it seems that such examples are somewhat easier to retain than those in 
which the change in pitch-direction is assymmetrical: C-E-F-G-C or 
C-G-A-B-C. 

The recognition and retention of changes in pitch-direction are 
further influenced by the absolute value or amount of change and also 
by the relative value in terms of the tonal environment. Changes 
involving only step-wise progression C-D-C or C-B-C are, other things 
equal, less noticed than changes involving skips: C-F-C or C-G-C; 
and these, in turn, are less noticed than wider skips: C-B-C or C-D-C. 


‘ 


> - a a = Se gc an 
— ts = x re =e ak tod 
ah iat ‘ Bag EW) ne , 
Se ee = Se 


a ms 
= eg aes eo LS 


tite _ 
ene 


RS gh ad as 


CS 2 


$9 Rie Gls 3 
: - 


t 
" 
: 


458 The Journal of Educational Psychology 


In Ex. 4, the change at the second tone was missed twice, that on 
the third tone, eleven times; in Ex. 9, that at F, five times, that at D, 
four times; in Ex. 18, that at F, six times, that at A, thirteen; in Ex. 20, 
that at E, four times, that at B, eleven. The frequency of this error 
thus varies with the size of the intervals at the point of change. 

In five-toned examples, the functioning of the relative value of 
pitch-change is very limited on account of the shortness of the examples. 
Slight indications are found in Ex. 4 where the ratio of error at E, 
compared to that at G, is higher than for other less pronounced 
progressions. Moreover, tests made with seven-toned examples 
showed a more marked difference. In the following 


é : £ 
the ratio which the change at D in the second example, bears to that 
at D in the first example, in frequency of being noticed, was approxi- 


mately seven to one. The step-wise difference in the first example is 
“‘dwarfed”’ by the size of the preceding intervals. 




















* 
Po i" ‘, ax f %* a 
a A s/ , ow” of 
wo * 


Fig. 3. 


Both the absolute and relative values of such pitch-changes can 
best be seen by projecting visually the examples we have used, taking 
the up-down aspect of the pitch series for the vertical dimension and 
temporal order as the horizontal dimension. This is done in Fig. 3. 





The | 
visua 
bring 
stud: 
“sta 
esca] 
Acco 
on, t 


of di 


fund 
are | 
exan 
The 
for a 
inter 
twen 
the 

Ex. 

exan 
cons 
ally 


to tl 


of o1 
wise 
disti 


are | 
for 
easi 
diat 
for | 
(dis. 
ples 


writ 








_— 


a 
is 


n 


1g 
id 


Tonal Determinants of Melodic Memory 459 


The agreement between the basic pitch attributes and those of such a 
visual projection is so marked, that a study of Fig. 3 will at once 
bring to light most of the salient features of the melodies we have been 
studying. What “stands out” for the eye in the outlines of Fig. 3, 
“stands out” also for the ear when the melodies are played; what 
escapes the eye at first glance also escapes the ear at first hearing. 
Accordingly, pupils react to the major outlines of a melody first, later 
on, to the details. (The two senses, of course, have also salient points 
of difference.) 

Finally, we must consider ascent versus descent. On account of the 
fundamentally ascending basis of our tonal system (scales and chords 
are constructed ascendingly) we may expect to find the ascending 
examples (1 to 10) somewhat easier than the descending (11 to 20). 
The actual percentile error distribution for all was forty-five per cent 
for ascending, fifty-five per cent for descending examples. For single 
intervals (first interval only) the ascending third, Ex. 5 was missed 
twenty-seven times, the descending third, Ex. 13, was missed forty-two; 
the ascending fifth, Ex. 4, forty-eight times, the descending fifth, 
Ex. 18, seventy-one times. Although other determinants in some 
examples contribute to this distribution, it occurs with sufficient 
constancy and in sufficient degree to make descending intervals gener- 
ally somewhat more difficult than ascending intervals. 

The frequency of error in pitch-direction or melodic outline points 
to this as the second determinant of melodic memory. 


III. CONJUNCT-DISJUNCT MOTION 


The next most fundamental distinction made in the raw material 
of our tonal system is the division into step-wise (conjunct) and skip- 
wise (disjunct) progression. Consequently, we may expect to find this 
distinction one of the determinants of melodic reaction. 

For proof we need look no further than Ex. 1 and Ex. 15. They 
are the only very easy examples of the entire series. The total errors 
for both examples was twenty-seven; the total errors for the next 
easiest example alone was thirty-four. The fundamentality of 
diatonic (stepwise) progression is shown further in the distributions 
for Ex. 3 and Ex. 11, in which the errors are concentrated on the skip 
(disjunctive) parts of the melody. It functions also when such exam- 
ples as Ex. 11 are heard as C-B-A-G-F. 

The number of times that the step of a second was incorrectly 
written was approximately half that for the lowest error frequency 





Fig. 4. 


The Journal of Educational Psychology 





460 


Serer esc eet a a . 
rte oe ao, Fe - > 
wars > te ef ON a tee - en - ee Ee a 
“ ‘ a3 Agee . ner = 2 * oad ee ate . ‘ ‘ ‘aoe. OR ei ia a ee F - 
; . eee hae OAS Soe gang Be tory eS SO RE TOE So ‘ 
a 2: - . = & 5 mee ag =e 





= - = : "tes a t.~ 
; . + 3 es ee 
- Sn Sm a oe 








~ 


Tonal Determinants of Melodic Memory 461 i 


£x.it é 4 a f 
¢ 





itil 


pitt 





AjLdt 





eitef 














Fig. 4.—(Continued) 





ge es 


. ee 


a ee 


4 

ue 
ae 
a 


- eo: ‘exes = I 
EON + 


462 The Journal of Educational Psychology 


between two skips. If we array all examples of Fig. 4, on a scale from 
easy to difficult, we find a marked preponderance of diatonic progres- 
sion for the easiest, and a gradual reduction of this as we increase the 
difficulty, until, for the two most difficult examples only a single 
step-wise interval remains, all others being skips. 

We have, then, as the third determinant of melodic reaction, step- 


wise progression, or conjunct motion as against skip-wise or disjunct 
motion. 


IV. DEGREE OF DISJUNCTIVENESS 


The determinant fourth in fundamentality is the degree-of-skip. 
That it is much less basic than any of the preceding three determinants 
is shown in the high frequency of errors made in the degree-of-skip 
discrimination. Whereas a second is seldom heard as a third or any 
other skip, a third is frequently heard as a fourth or fifth, and other 
wide intervals are heard incorrectly even more often. 

The frequency of the degree-of-skip error varies directly with the 
pitch distance of the given interval. A fifth is heard more frequently 
as a sixth or a fourth (neighboring intervals) than as a seventh or a 
third. In the examples here used, ascending and descending errors 
of one scale degree totalled eleven per cent; of two scale degrees 
four per cent; of three, one per cent; and of four, four-tenths of one per 
cent. That is to say, as we increase the distance from the given 
interval, we decrease the frequency of error. 

This applies, however, to melodic intervals only. If the two tones 
of an interval be given simultaneously (harmonically), the fusional 
aspect (concordancy and discordancy) modifies the melodic distribu- 
tion. (For a brief discussion of the harmonic aspect see the author’s: 
Notes on Interval Discrimination. Peabody Bulletin, May, 1932.) 
It is the playing over of this harmonic aspect into the melodic field 
that accounts for some of the octave transpositions seen in Fig. 4, 
especially for the final tones. 

We find, further, that narrow intervals have a smaller range of error 
than wide intervals: Errors for thirds varied from seconds to fifths; 
for fourths from seconds to sevenths; for fifths and sixths from seconds 
to octaves. This brings to light the characteristic difficulty of wide 
intervals. Training, accordingly, should proceed from narrow skips 
to wide. If the problem be discrimination between melodic intervals, 
the procedure should be from wide pitch differences between each pair, 
to small differences. 





react 
moti 


orde’ 
and 


Suck 
this, 
ingly 
audi 


freq 
C-E 
that 
plac 
pite 


witl 
in t 
min 
one 
con 


sing 
arr: 
cha 


tior 
me! 
the 


nec 
firs 
thi 
tor 
In 

dis 








rom 
Tes- 

the 
ngle 


tep- 
inct 


kip. 
ants 
skip 
any 
ther 


the 
ntly 
or a 
rors 
rees 


ven 


nes 
ynal 
bu- 
r’s: 
32.) 
ield 


Tor 
ths; 
nds 
ride 
cips 
als, 
air, 





Tonal Determinants of Melodic Memory 463 


In the light of these results, the fourth determinant of melodic 
reaction may be considered pitch-distance, or the degree of disjunct- 
motion in either pitch-direction. 


V. MISCELLANEOUS DETERMINANTS 


1. Order—An important element of any group of linear items is 
order. It accounts for the typical reversal of letters in word-spelling 
and that of figures in number-spelling. In tone-spelling it would like- 
wise show a reversal of tones: C-F-D-E-C might become C-F-E-D-C. 
Such a reversal, however, involves two changes in pitch-direction and 
this, as a major determinant, tends to emphasize the reversal. Accord- 
ingly, such reversals are not frequently encountered, so long as 
auditory memory is allowed to function. 

Reversals that do not involve changes in pitch-direction occur more 
frequently. In such cases the reversal concerns intervals. If 
C-E-F-A-C is reproduced as C-E-G-A-C, the interval of a second and 
that of a third have been reversed. No change in pitch-direction takes 
place and since the degree-of-skip determinant is much less basic than 
pitch-direction, interval reversal is more frequently found. 

The frequency with which errors of tone-reversal occur compared 
with that of tone-substitution (the introduction of a pitch not present 
in the given example) is so small that it scarcely functions as a deter- 
minant. For the entire twenty examples it was slightly more than 
one per cent. If we consider interval reversal, the ratio would be 
considerably higher. However, the fact that interval reversal occurs, 
does not necessarily mean that it caused the error. In examples of 
single ascent or descent, such as Exs. 3, 5, 7, 11, 13, 17, any attempt to 
arrange the five tones within an octave, would result in such a chance 
change of interval, of which the pupil need not be aware at all. 

The relatively infrequency of order-error holds for auditory reten- 
tion. If the pupil retains merely the letter-name order, without its 
melodic equivalent in sound, the reaction becomes exactly similar to 
the number series, and the frequency of reversal of items increases. 

Since the tones of a melody are necessarily successive, not simulta- 
neous, our reactions to the first and the last parts willdiffer. Whenthe 
first tone is given, the second tone, other things equal, is easier than the 
third tone, and this is easier than the fourth. Thus in Ex. 3 the dia- 
tonic progression C-D-E, which we have found to be easy, occurs first. 
In Ex. 7, the similar diatonic progression, A-B-C, occurs last. The 
distribution of errors is much greater for Ex. 7 than Ex. 3, a difference 





A. 


—_—- = 
ORAS aes 


¥ GS Seige Sasi fhe ee 


Nee 


EL 


Pe 


SR eat 


i 
a | 
itt 
4 3 


s ome : 
Seg ee See 


464 The Journal of Educational Psychology 


determined primarily, but not entirely, by the position of the group 
in the phrase as a whole. Using the three examples: C-D-E-G-C; 
C-E-F-G-C; C-F-A-B-C, which have the diatonic progression respec- 
tively at the beginning, in the middle, and at the end, the error dis- 
tribution for three classes was 7, 57, 50, showing the first to be much 
easier than the other two, and the position at the end to be slightly 
easier than that in the middle. 

The fact that pitch-outline is a major determinant means that 
when a single tone is heard incorrectly, the succeeding tone or tones 
will be incorrect in a great many cases. When C-F-G-G-C is heard 
C-G-A-A-D, the errors on tones 3, 4, and 5, result from the error on 
tone 2, because the melodic outline: Ascending step, tone-repetition, 
descending fifth, remains correct. This ‘“‘order” error is, without 
doubt, the most frequent single type that occurs. Consequently, it 
deserves much consideration in teaching. It shows conclusively the 
need for extensive drill on two-toned melodies (single interval) before 
attempting longer phrases. The importance of this interval prepara- 
tion in the dictation and recall of melody cannot be overemphasized. 

A further interesting instance of order transfer is seen in the 
examples of Fig. 1. In nine of the versions the repeated element is 
misplaced from the standpoint of order, but the element of repetition 
is retained. (A similar error occurs in word-spelling where the wrong 
letter is doubled.) 

2. Chord-structure.—The presence of harmonic relationships in our 
tonal system plays over into our melodic reactions to modify and 
sometimes obscure the determinants which we have considered. 
Melodically C-E-G-E-C is less “‘unified’”’ than C-D-F-D-C, since 
step-wise progression is the essence of melodic unity (Exs. 1 and 15). 
Yet the first group is so strongly associated with the harmonic idea: 
Triad-on-C, that the entire group is reacted to as a familiar higher- 
unit, thus making the example much easier than it otherwise would be. 
In the second example any harmonic unity, such as inversion of incom- 
plete secondary seventh chord, is much more complex, and does not 
function so readily. 

This chord reaction or memory accounts for our hearing a group 
such as C-F-A-F-C as C-G-C-G-C or C-A-C-A-C; and a group such 
as C-A-F-A-C as C-G-E-G-C. It explains the octave transpositions 
seen in Fig. 4. 

Other instances of error resulting from the chord-structure deter- 
minants are the following: C-A-G-G-C as C-A-F-F-C; C-A-G-G-C 


Fou 
Sixt 


Th 
val 
det 


of | 
cen 
thi 
(ex 
thi 
ten 
cor 


giv 
cal 








at 
1e8 
rd 
on 
yn, 
ut 

it 
he 
re 
ra- 
ad. 
he 
is 


ng 


yur 
nd 
ed. 
1ce 
5). 
ea: 
er- 
be. 


not 
yup 
ich 


ons 


eT- 


1-C 





Tonal Determinants of Melodic Memory 465 


as C-A-F-F-C ; C-D-E-G-C as C-D-E-B-C; C-D-E-G-C as C-D-E-D-C; 
C-E-G-E-C as C-"E-G-"E-C. A harmonic error such as this may alter 
the percentile distribution found for the degree-of-skip determinant. 
C-E may be more frequently heard as C-G (parts of the triad C-E-G), 
than as C-F, which is melodically nearer. This is an instance of the 
melodic determinant being modified by the harmonic, and it is wrong 
to assign the C-G preference to melodic attributes. 

3. Contraction.—When we study the frequency with which intervals 
are contracted into smaller intervals and expanded into larger intervals, 
we find a rather pronounced tendency to contract them. The tests 
yielded the following distribution on this point: 








Interval Contraction, per cent | Expansion, per cent 
ee idiedn chew dcqiinkne 64 36 
Ee ee 78 22 
ad id ora re 46% dics wanda 75 25 
PIN duck 4s co nsanens Gu cuted 77 23 











This table divides into two classes: (1) Thirds, and (2) the other inter- 
vals. The distinctive feature for thirds results from the step-skip 
determinant; a third (skip) can be narrowed only to a second (step), 
a change which is a fundamental difference (see III). 

The contraction determinant accounts for the following frequency 
of errors: C-A-F-A-C heard as C-G-E-G-C (expansion) in fifteen per 
cent of cases; C-G-E-G-C heard as C-A-F-A-C (contraction) in 
thirty-seven per cent of cases; C-E-G-E-C heard as C-F-A-F-C 
(expansion) in zero per cent of cases; the reverse (contraction) in 
thirty-five per cent of cases. The psychological reason for this general 
tendency to contract is not easily explained. It may have some 
connection with the Miiller-Lyer illusion in optics. 


CONCLUSIONS 


In the light of the preceding analysis we can arrange the examples 
given in Fig. 4 on the basis of difficulty of retention and assign the 
causes for the difficulty. 


Ex. 1. Very easy; diatonic progression only; pitch-change symmetrical. 

Ex. 2. Moderately difficult; interval error (degree-of-skip) on first interval 
with retention of repetition and pitch outline. The first error accounts for the 
wide pitch-range (D to C for second tone); the other errors for the retention of this 
wide range from tones 3 and 4, and, to some extent, 5. 





wes and 
oe eee 


466 The Journal of Educational Psychology 


Ex. 3. Easy; concentration of errors on tones 4 and 5 on account of the ease 
(step-wise progression) of the first part: C-D-E. 

Ex. 4. Difficult; wide first interval; frequent change of pitch-direction; 
variety of interval. Hence all variables, no constants. (The ascending first 
interval is easier than the same descending interval of Ex. 18.) 

Ex. 5. Moderate; variety of interval; relative ease of first interval. 

Ex. 6. Easy; basic chord-determinant. 

Ex.7. Moderately difficult; moderately wide first interval; variety of interval; 
retention of pitch-outline. (See Ex. 2.) 

Ex. 8. Moderately difficult; wide first interval; retention of pitch-outline; 
variety of interval. 

Ex. 9. Difficult; all skips; frequent change of pitch direction. Less difficult 
than Ex. 18 or 20 on account of similarity of interval. 

Ex. 10. Moderately difficult; contraction error and chord substitution: 
C-E-G-E-C. Note concentration of errors on this chord form and absence of 
such concentration in Ex. 6. 

Ex. 11. Moderately easy; concentration of difficulty on tone 4; ease of first 
part, diatonic progression. 

Ex. 12. Moderate; contraction error on first interval; retention of pitch out- 
line; mixed outline, similar to Ex. 2. 

Ex. 13. Difficult; descending progression; order of interval. 

Ex. 14. Difficult; width of first interval; descending direction. (See Ex. 8.) 

Ex. 15. Very easy. (See Ex. 1.) 

Ex. 16. Moderately easy; chord determinant, but deueedion, Whereas 
in Ex. 10 the errors are almost entirely a contraction into the triad on C, in Ex. 16 
they are divided between the chord form C-G-E-G-C and the contraction 
C-B-G-B-C. 

Ex. 17. Difficult; variety of intervals; descending progression. (See Ex. 13.) 

Ex. 18. Very difficult; wide descending first interval; frequent change of 
pitch-direction; variety of interval. All variables, no constant. 

Ex. 19. Moderate; chord determinant. 

Ex. 20. Very difficult; see Ex. 18. 


On the basis of this classification we can develop an extended series 
of examples for drill, leading by small increments from very easy to 
very difficult. We can likewise immediately isolate points of difficulty, 
thus eliminating waste of time and effort on non-essential drill. We 
can, further, separate errors that are caused by difficulties in the 
subject-matter from errors caused by individual variation. Thus the 
substitution of C-E-G-E-C for C-F-A-F-C is typical; but the last 
version of Ex. 12, Fig. 1, showing complete pitch-inversion is purely 
individual and of no consequence to other students. The fact that 
melodic memory is an important element in all work in music, even in 
mere appreciation without participation, and the fact that it is dis- 
tinctly trainable, both show the need for improving the teaching pro- 


ced 
poi 


pro; 


one 


tior 
cha 


cha 


var 








mn; 


al; 
ne; 
ult 


on: 
of 


rst 


ut- 


3.) 


eas 
16 
ion 


(3.) 
of 


‘ies 

to 
ty, 
We 
the 
the 
ast 
ely 
hat 
1 in 
dis- 


IrO- 





Tonal Determinants of Melodic Memory 467 


cedure wherever possible. In outlining such training, the following 
points should be kept in mind. 


1. Begin with melodies of two tones, first tone given. Stepwise progression 
first, then narrow skips. Wide skips last. 

2. Use repetition as the easiest element to remember, first with diatonic 
progression, then with skips. 

3. Use diatonic progression; first with no change in pitch-direction; then with 
one change. 

4. Introduce skips one at a time, preferably at the beginning of the example. 

5. Use triad figures, root position. 

6. Introduce wide skips by restricting them to the first interval. 

7. Introduce more than one change of direction by using interrupted repeti- 
tion, first with diatonic progression, then with skips. The adding of a second pitch- 
change increases the difficulty considerably. 

8. Increase variety of interval in any one example gradually; first without 
change of pitch direction. 

9. Introduce changes in pitch-direction by repeating the same interval. 

10. Reserve examples containing all variables: change of direction, skips, 
variety of interval, until the preceding types have been mastered. 


os 
= ee. 
< 


aan 


oe 
Sao ae 


eat oy eae <9 
ee 


petite : 
RAT = 


ad? eat 


a 
So EBs 
oe 


f 


VARIATION OF IQ’S OBTAINED FROM GROUP TESTS 


W. S. MILLER 
University of Minnesota 


Now that group intelligence tests have been in use for more than 
a decade we may expect reports of repeated examinations of school 
children with longer intervals between the examinations. Hirsch! 
has reported the results of the re-examination annually of a group of 
children in the elementary school. The following tests were adminis- 
tered in the order named: 1, Otis Primary, Form A; 2, Otis Primary, 
Form B; 3, Otis Primary, Form A; 4, Otis Advanced Examination, 
Form A; 5, Otis Advanced, Form B; 6, Otis Advanced, Form A. The 
children were examined annually as they progressed from the first to 
the sixth grade. 

From Table I of Hirsch’s Study (p. 506) containing the individual 
IQ’s for three hundred forty-three subjects, one hundred sixty 
cases who had taken all of the six annual examinations have been 
selected for this report. Table I below gives the Means and SD’s of 
the distributions of IQ’s for each of the annual examinations of these 
one hundred sixty pupils. 


TaBLE I.—MeEans AND SD’s or THE DistrRiIBvuTIONS OF ONE HUNDRED Sixty 
IQ’s For THE Srx-YEAR PeERi0p 








I II Ill IV V VI 
— TTT Te Cee eer rTTT eT 103.3 | 107.25) 107.5 | 107.4 | 108.1 | 112.2 
Pere Te rrr errr TT Te 13.88} 12.94) 14.75) 16.53) 19.2 | 19.76 























An inspection of the means and standard deviations reveals the 
fact that the IQ’s obtained from these six tests are not directly com- 
parable. They can be made directly comparable by converting the 
distributions into distributions having the same mean and standard 
deviation. This can be done conveniently by a method described 
elsewhere by the writer.” 


1 Hirsch, N. D. M.: An Experimental Study upon Three Hundred School 
Children over a Period of Six Years. Genetic Psychology Monograph, Vol. VII, 
1930, p. 6. 

2 Miller, W. S.: The Variation and Signi‘icance of Intelligence Quotients 
Obtained from Group Tests. Journal of Educational Psychology, Vol. XV, Sep- 
tember, 1924, pp. 359-366. 





468 





of | 
dist 
thir 


TaB 


exai 


IQ’ 
two 


exa 


fact 
The 
diff 
apa 
to 1 


test 
cha 








ee el 


VY ove ™ 


he 
n- 
he 
rd 
ed 


ool 
II, 


Variation of IQ’s Obtained from Group Tests 469 


In Table II are recorded the means and SD’s of the distributions 
of I1Q’s for the six-year period after they have been converted into 
distributions with means and SD’s similar to those for the test in the 
third year. 


TasLe II.—Megans anv SD’s or THe DistrisuTion or IQ’s EquaTEpD IN TERMS 
OF THE THIRD-YEAR Test, Otis Primary, Form A 








I II III IV V VI 
heey reer eres ree 107.9 | 107.1 | 107.5 | 107.65) 107.8 | 107.5 
Rewer err Tr rer 14.5 | 14.85 14.75 14.4| 14.85) 14.5 























Inaccuracy in reading the graph probably accounts for lack of 
exact equality of the six means and standard deviations of the equated 
1Q’s. 

One is now in position to determine the IQ differences between any 
two of the annual examinations. 

Table III shows the median differences of IQ between the annual 
examinations. Fifteen median differences are recorded. 


TasLe III.—Mepian oF DIFFERENCES OF IQ’s OBTAINED IN ANNUAL 








EXAMINATIONS 
IT III IV V VI Average 
I 5.2 5.6 6.7 6.1 7.3 4.1 
II 5.4 7.9 6.9 7.3 4.7 
Ill 6.2 6.2 7.1 4.6 
IV 5.1 6.1 4.6 
V 5.3 3.7 
VI 4.8 























In comparing these median differences one must reckon with two 
factors, the length of time between the tests and change of tests. 
The differences are italicized where change of test is involved. The 
differences at the extreme left in each row are between tests one year 
apart; the next differences are between tests two years apart and so on 
to the difference between I and VI, five years apart. 

It is difficult to evaluate the effect of increased time between the 
tests since whenever more than two years elapses between them a 
change of test is always involved. 











2 a ee 
hes a f 
pore 


it 
e 
if 
44 


Bs 
‘® 

x 

$4 

nas 

yy) 
4% 
“ 


470 The Journal of Educational Psychology 


The averages of the median differences for the intervals 1, 2, 3, 4, 
and 5 years are 5.4, 6.4, 6.9, 6.7 and 7.3 respectively. 

The column at the extreme right of Table III shows the median 
of the differences of 1Q’s between each test and the average IQ for the 
six tests. 

The fifteen median differences in Table III each involved differences 
between one hundred sixty pairs of examinations. These twenty-four 
hundred differences of IQ’s distributed as follows: 








IQ differences........ 0-4 | 5-9 | 10-14| 15-19 | 20-24 | 25-29 | 30-34 | 35-39 
eer er 958 | 778; 434| 163) 49 14 3 1 
err 6.55 
| rer es er 3.8 





























72.3 per cent of the differences are less than ten points of IQ. 
Less than three per cent of the differences exceed twenty points of IQ. 


While these results are quite satisfactory in that they show con- 
siderable constancy of the IQ obtained from these group tests there are 
some individuals who show marked fluctuation as may be seen from 
Table IV in which are given the IQ’s of ten individuals who have the 
largest mean variations. 


TaBLeE IV.—EquatTep IQ’s or TEN CHILDREN WITH THE LARGEST MEAN 
VARIATIONS IN THE Srx EXAMINATIONS 



































No. 1} u{|mtiiwti vit vr [A] mv | Range 
age 
129 115 | 108 | 106 | 85 | 79 | 89 | 97] 12.7] 36 
152 148 | 145 | 173 | 169 | 155 | 150 | 157| 9.7] 28 
210 13 | 97 | 94 | 118 | 114 | 121 | 110! 9.2] 27 
123 133 | 144 | 140 | 111 | 125 | 128 | 130| 8.8] 33 
8 900 | 103 | 104 | 107 | 89 | 84 | 96| 8.5| 23 
98 126 | 124 | 147 | 122 | 141 | 129 | 132] 8.5] 25 
54 122 | 119 | 105 | 99 | 100 | 107 | 109| 8.0] 23 
106 97 |118 | 108 | 90 | 94 | 99 | 101| 8.0| 28 
118 97 |113 |111 | 91 | 92 | 96 | 100] 8.0] 22 
158 110 | 107 | 106 | 108 | 126 | 125 | 114] 8.0) 20 
Avernas....... 115.1] 117.8] 119.4| 110.0! 111.5] 112.8 





Hirsch discusses the causes of some of these extreme fluctuations. 
Seven of the ten have lower average IQ’s on the Advanved Examination 
than on the Primary. Five of these seven average thirteen or more 


poi 
the 
exa 
nat 
cas 
ten 
low 
hov 
test 


Th 


TAB 


tim 
cor 


nur 
anc 








re 


he 


ige 


OW DW Ww OW Ww ™ NY we 


ns. 
ion 
ore 





Variation of IQ’s Obtained from Group Tests 471 


points lower. Only two of the ten cases average markedly higher on 
the advanced examination. The average of the three primary 
examinations is 117.4 but the average of the three advanced exami- 
nations is 111.4. Since the average IQ of the one hundred sixty 
cases is the same (approx. 107.5) for each year, it is clear that for these 
ten cases the Advanced Examination is rather consistently yielding 
lower ratings than the Primary. A study of the ten cases will reveal, 
however, that all of the fluctuations can not be ascribed to a change of 
tests. 

In Table V are the inter-correlations of the six annual examinations. 
The average of these fifteen correlations is .81, range .74 to .88. 


TaBLE V.—CORRELATIONS BETWEEN [Q’s OBTAINED IN THE ANNUAL EXAMINATION 
or One Hunprep Sixty CHILDREN 








II III IV V VI Average 
I .857 .819 . 786 .819 .790 .913 
II .856 . 740 .781 . 765 .885 
III .788 .813 .760 911 
IV .855 .816 .901 
V .880 .931 
VI . 908 


























PE’s .007 to .024 N = one hundred sixty for each correlation. 


The following summary will throw some light on the relation of the 
time interval between the tests and the magnitude of the coefficient of 
correlation. 








. Interval between Primary- 
No. of r’s teats, yeare ota Average 

4 1 0 . 862 
2 2 0 .817 
5 1 1 .847 
4 2 2 .797 
3 3 3 .776 
2 4 2 .792 
1 5 1 .79 














In the column headed ‘‘Primary-advanced”’ are recorded the 
number of times the correlations involve different tests, the Primary 
and the Advanced. From these data it is safe to conclude that both 





SN ara eee ee 
< as he 
oe _ nee 


ag Se See? 
ae se 
nae at 


ak Rear? QP aee Se Sy 7 rere 
: ov 
we 


A tT TI 
Fos 
ade 52” snes" — init alia « 
Sages = 


a oe lee, 


: ¢ 
A 
4 
q 
. 
; 
3 


472 The Journal of Educational Psychology 


the time interval and the change of tests have some relation to the 
correlations. However, the changes in the coefficients are small under 
the conditions indicated above. 

It is interesting to compare these results with those obtained from 
examining one hundred sixty pupils (Md. C.A. 13-6) with five group 
tests upon entrance to University High School, all five tests having 
been given between 9:00 to 12:00 on the same day. The pupils took 
the five tests in the order named: 


Pressey’s Senior Classification Test. 

Haggerty’s Delta 2, Form A. 

Terman’s Group Test of Mental Ability, Form A. 
Miller Mental Ability Test, Form A. 

Army Alpha, Form 8. 


Am Oh 


The University High School group has a mean IQ of 118.1, SD 12.2. 
(Equated on terms of Terman Group Test.) 

Table VI shows the median difference of IQ’s obtained from the 
five group tests. 


TaBLeE VI.—MeEpIAN OF DIFFERENCES OF [Q’s on Five Group Tests GIVEN ON 
THe Same Day to One Honprep Srxty Enterina Hier Scxsoou Poupiis 





Pressey | Delta 2 | Terman | Miller | Average 





RE 55 ¢sdenckeenny 5.3 5.8 4.9 4.5 3.2 
I. «bb onee0secnees 6.0 4.7 6.3 3.45 
ae 4.5 6.9 4.17 
ice cawonees 4.9 3.08 
| rere re ree 4.16 




















This table should be read as follows: The median of the one hundred sixty 
differences of IQ’s obtained on Alpha and Pressey is 5.3 points. 


The ten median differences in Table VI each involved differences 
between one hundred sixty pairs of examinations. These one thousand 
six hundred differences of IQ’s distributed as follows: 








IQ differences.............-. ....| O04 | 5&9 | 10-14] 15-19 | 20-24 
7 ERS ... | 743 | 565 | 226 55 11 
Re. 5.5 
Rae 3.15 























Eighty-two per cent of the differences are less than ten points of IQ. 
Less than three per cent of the differences exceed twenty points of IQ. 


(6.5 
diffe 
grot 
(Me 
tha 
mo! 
fort 
wit] 
are 


test 


Alp! 
Delt 


Ter: 
Mill 


Av 








1€ 
er 


1g 
k 


1€ 


IN 


ty 


28 


id 


Variation of IQ’s Obtained from Groups Tests 473 


This median difference (5.5) is lower than-the median difference 
(6.55) when the examinations extended over a five year period. This 
difference, however, may not be due entirely to the time element. The 
group examined on the same morning is older, is a more selected group 
(Md. IQ 118.1) and more homogenous (SD of IQ 12.2). It is possible 
that the measuring instruments are more reliable with the older and 
more intelligent pupils. Furthermore the conditions effecting per- 
formance on the tests would be more constant for the group examined 
within a period of three hours. It is, indeed, surprising that the results 
are so similar. 

In Table VII are the inter-correlations of the IQ’s obtained by the 
tests given on the same morning. 


TasBLe VII 





Pressey | Delta 2| Terman; Miller | Average 








REPRESSES rere eer err .843 . 757 .878 .859 .925 
CS cide e kang 60nede whan’ anes .813 . 860 .772 .919 
6k owe ek a hema seed cass uae .833 .709 . 886 
eas sua aaree eeu ead ene ror Pees .850 .946 
ad inns ti eke Sach ended one me per ee .894 

















PE’s .006 to .027. 
N, 160. 


The mean of these inter-correlations is .82, the range .709 to .878. 











Tasie VIII 
No. CA | Alpha Fres- Delta | Terman | Miller Aver- MV | Range 
sey age 
160 12-2} 125] 117 120 137 138 | 127]; 8.0) 21 
47 13-4; 119] 123 132 113 108 | 119} 6.8) 24 
68 13-6} 113) 114 127 122 105 | 116 | 6.6) 22 
100 13-1} 113] 119 103 106 123 | 113 | 6.6; 20 
14 12-9 126; 114 112 129 124} 121] 6.4) 17 
29 14-4; 108; 115 126 109 105 | 113} 6.4) 21 
36 13-0} 109]; 123 120 120 106 | 116 | 6.44 17 
136 15-3) 83 93 92 79 75 84 | 6.4, 18 
57 12-9 125] 116 134 133 133 | 128} 6.2) 18 
91 12-7} 119] 122 113 116 136 | 121 | 6.2) 23 
Average........ ....| 114.0) 115.6 | 117.9) 116.4 | 115.3) 115.8 



































te 


474 The Journal of Educational Psychology 


In Table VIIT are ten cases whose mean variations of IQ are the 
largest in the entire group of one hundred sixty. 

The performance of this group is quite erratic but it must be remem- 
bered that they represent the ten most variable in one hundred sixty; 
and yet in one-half of the cases three of the five tests are in close agree- 
ment. Considering the fact that these one hundred sixty children were 
assembled in a strange building from widely separated elementary 
schools in the twin cities, that the examiner and his assistants were 
strange, and that the examinations lasted for three hours the results 


are very gratifying and compare very favorably with results of repeated 
individual examinations. 





TH 


TIO! 
ples 
and 
Joh 
can 
feat 
mor 
sop 








he 


y; 
e- 
re 


Tre 
its 
ed 


THE ORIGIN OF THE PLEASURE-PAIN THEORY OF 
LEARNING 


HULSEY CASON 


University of Wisconsin 


Dr. W. H. Pyle in a recent number of THe JourRNAL or Epuca- 
TIONAL PsycHo.LoGy! takes issue with the writer’s claim? that the 
pleasure-pain theory of learning was originated by Spencer, Bain, 
and Baldwin. Dr. Pyle states that the germ of the theory is found in 
John Locke’s Essay and that it even goes back to Aristotle. One 
can generally assert that anything goes back to Aristotle without 
fear of contradiction, but we are not entirely convinced that this 
modern and scientific theory of learning had its origin in the philo- 
sophical writings of ancient times. 

The pleasure-pain theory of learning which we have in mind 
seems to have been originally advanced by Spencer, Bain, and Baldwin 
as an explanation of the way animals and men adapt themselves 
to their environments, and it was first described, and has at all times 
been discussed, principally in connection with manual movements. 
Spencer, Bain, and Baldwin also discussed the theory in connection 
with such subjects as evolution, heredity, and adaptation to the 
biological environment. This pleasure-pain .theory was not an 
intellectual descendent of Locke’s views on the association of ideas 
or of Aristotle’s views on the soul. Thorndike’s 1911 formulation of 
the law of effect, however, was a natural descendent of this nineteenth 
century development in genetic psychology. The difference between 
the association of ideas and the soul, on the one hand, and on the other 
the acquisition of motor habits seems to be of some general importance. 

After examining the Spencer-Bain-Baldwin theory of learning*® 
it appeared that the theory suffered from several defects, among 
which the following can be mentioned: (1) Much of our learning 
does not involve adaptation by means of random manual movements, 
(2) the theories of heredity advanced by Spencer, Bain, and Baldwin 
are incorrect, (3) their physiological theory of pleasure and pain is 
improbable, (4) their pleasure-pain philosophy is inadequate, and 





1 Origin of the Pleasure-pain Theory of Learning. J. Educ. Psychol., Vol. 
XXIV, 1933, pp. 303-304. 

2 The Pleasure-pain Theory of Learning. Psychol. Rev., Vol. XX XIX, 1932, 
pp. 440-466. 

> Op. cit. 


475 








476 The Journal of Educational Psychology 


(5) their concept of the relation between pleasant and unpleasant 
feelings is questionable. It is not surprising that the explanation 
given by Spencer, Bain, and Baldwin should be deficient because 
later experimental work has removed the scientific basis of their 
claims. The scientific factors involved in this theory of learning are 
not at home in the philosophical speculations of Locke and Aristotle. 

The passage which Dr. Pyle takes from ‘“‘An Essay Concern- 
ing Human Understanding,” from Book 2, Chapter 10, does not 
adequately represent Locke’s views on learning, because at least 
equally representative but very different descriptions of learning can 
be found in other parts of the Essay. In the last chapter of the 
second book, Locke describes the acquisition of everyday antipathies 
and aversions, but he is talking about the association of ideas in this 
place also. In his description of how the names of objects are learned, 
in Book 3, Chapter 9, he says, ‘‘If we will observe how children learn 
languages, we shall find that to make them understand what the 
names of simple ideas, or substances, stand for, people ordinarily 
show them the thing whereof they would have them have the idea; 
and then repeat to them the name that stands for it, as white, sweet, 
milk, sugar, cat, dog.’”’ The opinions which Locke expressed on the 
learning process are on the whole more favorable to the conditioned 
response principle than to the pleasure-pain theory. The acquisition 
of antipathies and aversions, which interested Locke particularly, 
is easily explained by the conditioned response principle, but the 
learning of any unpleasant activity is quite difficult according to 
the pleasure-pain theory of learning which claims that unpleasant 
activities are not learned, or if learned, are stamped out. How, for 
example, can the learning of an unpleasant fear be explained by the 
pleasure-pain theory? 

The passage which Dr. Pyle has resurrected from Aristotle’s 
De Anima is as follows: ‘‘Sensation, then, is analogous to simple 
assertion or simple apprehension by thought and, when the sensible 
thing is pleasant or painful, the pursuit or avoidance of it by the 
soul is a sort of affirmation or negation. In fact, to feel pleasure or 
pain is precisely to function with the sensitive mean, acting upon 
good or evil as such. It is in this that actual avoidance and actual 
appetition consist.”” As a modern scientific theory of learning, this 
almost incomprehensible passage is not entirely satisfactory. Several 
old but more pertinent passages can be found in Aristotle’s Rhetoric, 
Plato’s Republic, and Plato’s Philebus, but this does not seem to be 


especi: 
Plato, 
ideas | 
proble 
referre 
Bain, 
discus 

Se 
techni 
theor 
suppo 
to th 
under 
psych 





1§ 
Learni 
Psyche 








aya 
mth 


Origin of the Pleasure-plain Theory 477 


especially significant for modern psychology, because Aristotle, 
Plato, and even Anaxagoras, from whom Plato obtained some of his 
ideas on pleasure and pain, were primarily interested in philosophical 
problems, and they have not directly influenced what may be properly 
referred to as the psychology of learning. The views of Spencer, 

Bain, and Baldwin have definitely influenced the trend of modern | 

discussions. | 
Several experimental studies which have been reported in the 


theory of learning,' and these and other studies have given some 

support to an explanation of learning that is similar in some respects 
to the conditioned response principle, a principle which has been 
underestimated and neglected by some of the leading students of the 
; psychology of learning. 


i technical journals have shown the inadequacy of the pleasure-pain 





, 1 Some references to this literature may be found in the writer’s paper, The 
, Learning and Retention of Pleasant and Unpleasant Activities. Archives of 
Psychol., Vol. XXI, No. 134, 1932, pp. 96. 





Se is 








? 5 
4g 
t 
ia 
He 
ny 





BOOK REVIEWS 


L. L. THurstone, The Theory of Multiple Factors, Chicago: Author, 
1933. Pp. vii + 65. 


The method of factor pattern analysis presented in this litho- 
printed booklet, like the method broached by Professor Hotelling 
elsewhere in this issue, is a mathematician’s response to the cry 
of the psychologist for an objective statistical method of investigation 
in the field of mental traits. The derivation is a straight-forward 
application of the method of least squares to the prediction of the 
intercorrelations of n variables, first by one, then by two, then by 
three, up to any number of mutually independent underlying factors. 
This means that in the solution of a given problem the loadings of 
the first factor in the variables are such that the sum of the squares 
of the differences between the raw correlations and the correlation 
that is due to the first factor loadings, is a minimum, i.e., the sum of 
squares of differences is less than any other set of loadings would 
yield; similarly, the loadings of the second factor are such that the 
sum of the squares of differences between the residual correlations 
left unexplained by the first factor, and the correlation due to the 
second factor loadings, is a minimum; and so on, for as many factors 
as the investigator chooses to determine. 

It has already been demonstrated that problems have no unique 
factor pattern solution, not even a unique linear solution. The 
solution by this method, then, is not unique, but rather is a particu’ * 
solution, objectively based on all the correlations of the table. No 
preconceived schematization of factor weights by the investigator 
can influence the solution; the method yields a single, and in that sense 
unique, solution. 

A data sheet for necessary computations has been worked out. 
The author deserves special commendation for providing not only 
detailed, step-by-step instructions, but also the original data sheets 
for the solution of a nine-variable problem in terms of three factors. 
The nine variables are the nine sub-tests of the College Entrance 
Examination Board’s Scholastic Aptitude Test, Form B, 1927, which 
have already been subjected to tetrad analysis by Brigham and Brolyer. 

The determination of the factor loadings in a given problem 
proceeds from trial values by approximations to final, or sufficiently 
approximate, values. The number of approximations required varies 
with the number of places desired in the solution. The author’s 

478 


estim: 


appro 
variak 
the cc 
TI 
purity 
typice 
residu 
the sé 
the se 
and p 
enhar 
his se 
for ex 
seem 
Profe: 
of th 
oppos 
more 
weigh 
Cor 


WAL1 
I 


Bk 
But n 
the p 
and ¢ 
tions 
justic 
readi! 
educe 
learn 
chans 
agem 
legal 
social 

B 
socio! 
Topi 








1e 
he 


Jo 


se 


Book Reviews 479 


estimate of one and one-half to two hours machine work for a given 
approximation is borne out by this reviewer’s experience with a six- 
variable problem in which six-place figures were carried throughout 
the computations. 

The weakness of the method in many problems stems from the 
purity of its mathematical conception. The method of least squares 
typically leads to such loadings of the first factor that the first factor 
residuals include many negative values, some of approximately 
the same order as the largest positive residuals. This requires that 
the second and later factors have negative loadings in some variables 
and positive loadings in others. Traits which behave in such fashion, 
enhancing the individual’s score on one test and operating to reduce 
his score on another test, are rarely met in the mental testing field, 
for example, where factor analysis is most used. It would therefore 
seem necessary to devise some modification of such a method as 
Professor Thurstone offers, so as to preserve the objective character 
of the solution, but to remove the anomaly of factors acting with 
opposite signs in different variables and to substitute a psychologically 
more defensible solution involving factors bearing only positive 
weights. WarRREN G. FINDLEY. 

Cooper Union Institute of Technology. 


WaLTER C. REcKLEss and MapuHeus Smitu. Juvenile Delinquency. 
New York: McGraw-Hill, 1932. Pp. XIII + 412. 


There is much talk about the seriousness of the crime problem. 
But many of the people who are doing the talking seem to believe that 
the problem can be explained by blaming it on a ‘‘lawless heritage”’ 
and can be solved by setting up crime commissions which start opera- 
tions with war whoops and offer as remedies such things as “‘swift 
justice,’’ rough treatment, or tinkering with court procedure. A 
reading of Juvenile Delinquency by Reckless and Smith would be a good 
educational dose for the foregoing individuals. From it they might 
learn that human nature can be changed. But that predictable 
changes presuppose intelligent management. And intelligent man- 
agement presupposes understanding. An understanding not only of 
legal involvements but of all relevant factors—biological, psychological, 
social. 

Both the statistical and psychological as well as the clinical and 
sociological contributions are adequately considered in this book. 
Topies considered include: Statistics about the delinquent population, 

































. Pa aT ep eh Oe ela 
CA OR eg? 


480 The Journal of Educational Psychology 


their offenses and dispositions; physical and mental traits; truancy; 
the juvenile court and institutional care; clinical causation, preventive 
programs and an evaluation of treatment. Three case studies and 
an outline for the clinical presentation of published and original cases 
of problem children are included in the appendix. Anindexisincluded. 
In their presentation of facts the authors manifest an understanding 
of the meanings of relevant studies as well as a familiarity with such 
studies and criticisms of them. Illustrations include: A discussion of 
Sutherland’s criticism of Goring’s study of ‘‘The English Convict” 
as well as a description of the study; a consideration of “‘ blind spots” 
in researches of family histories such as the Kalikaks; the inclusion 
of Mohr and Gundlach’s and other check ups on Kretschmer types as 
well as a discussion of the types. In their evaluations the authors show 
themselves to be well informed, critical and well balanced. Their 
treatment of causation is a case in point. Aware of the inadequacy of 
any single factor explanation they favor the multiple causation but 
with this reservation—all discoverable conditions are not necessarily 
causes. 

Of special interest to educational psychologists will be the fairly 
comprehensive chapter on ‘‘Truancy and School Maladjustment.”’ 
In this chapter the work of visiting teachers, school clinics, vocational 
education, character training and the Parent-Teacher movement are 
considered as well as the facts about the emergence of the truant 
pattern. 

The psychoanalytic contributions are the ones not sufficiently 
emphasized. And for this omission psychoanalysts will likely condemn 
the authors. On the other hand, inclusion would have brought con- 
demnation from some academic psychologists and sociologists. Appar- 
ently the authors did not choose to overload. After all, individuals 
who want the psychoanalytic contributions can get them elsewhere 
easily enough. Allin all, this book contains what the present reviewer 
considers the most comprehensive presentation of juvenile delinquency 
which has appeared to date in the covers of one volume. 

St. Louis, Missouri. H. ME.LrTzeEr. 






s- @& wo @i: apgaw SS 2S ss: Fa Boe mw we SS >» s. - — «> «> — — <a a” ae a 


